Closed zipperer closed 5 months ago
Files for experimenting:
Columns for experimenting: filename | date | timestamp | book | chapter | verse
filename repeated for each entry
date -- I can make the column type date then insert a timestamp, see e.g.
CREATE TABLE my_timestamp_table (
id SERIAL PRIMARY KEY,
my_timestamp TIMESTAMP,
my_date DATE
);
INSERT INTO my_timestamp_table (my_timestamp, my_date) VALUES (
'Mon 4 Nov 2013 23:21:49',
'Mon 4 Nov 2013 23:21:49');
timestamp book chapter verse
Could make one csv per file. Could make one csv file and append to it for each file. Could insert directly into database from python program.
Current approach:
Columns for experimenting: filename | date | timestamp | book | chapter | verse
CREATE TABLE bible_display_app_file_entries (
filename VARCHAR,
filename_date DATE,
entry_timestamp TIMESTAMP,
book VARCHAR,
chapter INT,
verse INT
);
I have
.txt
file into a .csv
.csv
file into a db tableostgres=# SELECT presentations_decoded_date.id, bible_display_app_file_entries.entry_timestamp, bible_display_app_file_entries.b\
ook, bible_display_app_file_entries.chapter, bible_display_app_file_entries.verse FROM presentations_decoded_date JOIN bible_disp\
lay_app_file_entries ON bible_display_app_file_entries.filename_date = presentations_decoded_date.presentation_date;
id | entry_timestamp | book | chapter | verse
------+---------------------+----------------+---------+-------
3191 | 2014-01-01 00:00:25 | Romans | 8 | 29
3191 | 2014-01-01 00:01:00 | Romans | 8 | 28
3191 | 2014-01-01 00:01:14 | Romans | 8 | 29
3191 | 2014-01-01 00:01:53 | Romans | 8 | 29
3191 | 2014-01-01 00:01:54 | Romans | 8 | 29
3191 | 2014-01-01 00:01:54 | Romans | 8 | 29
3191 | 2014-01-01 00:05:42 | Romans | 7 | 20
...
script that converted the .txt
into a .csv
: bible_display_app_file_to_csv.py
example output csv file: BDA 2014 1 01.csv
interactions for making table and copying data into table: docker_postgres_interactions_copy_into_table_and_query.txt
Joining after adding the entries from BDA 2014 10 05.csv
shows we will get duplicate rows by matching just on day and not on 'morning' or 'night'
3302 | 2014-10-05 00:00:33 | Ephesians | 2 | 10
3301 | 2014-10-05 00:00:33 | Ephesians | 2 | 10
3302 | 2014-10-05 00:01:53 | Romans | 7 | 25
3301 | 2014-10-05 00:01:53 | Romans | 7 | 25
3302 | 2014-10-05 00:01:56 | Romans | 8 | 1
Joining after adding the entries from BDA 2014 10 05.csv shows we will get duplicate rows by matching just on day and not on 'morning' or 'night'
See issues about splitting file into morning file and evening file for each file that has both.
The experiments above for this issue satisfy the initial goal of getting entries into the database. So I close this issue.
Future issues will address the open questions of:
Iterate through BDA files. For each BDA file, for each entry in file, make row: FILENAME, DAY, TIMESTAMP, BOOK, CHAPTER, VERSE.
While experimenting, maybe restrict to a few files. Determine what to do for all files, but restrict tests to just a few.
Want enough files and entries to be able to join the three tables: 1. presentations, 2. BDA filenames, 3. BDA entries.
Do we need all three tables? We currently link date -> date from 1 to 2. We could link date -> date from 1 to 3.