Make table with columns 1. BDA filename 2. entry from BDA filename

zipperer commented 5 months ago

Iterate through BDA files. For each BDA file, for each entry in file, make row: FILENAME, DAY, TIMESTAMP, BOOK, CHAPTER, VERSE.

While experimenting, maybe restrict to a few files. Determine what to do for all files, but restrict tests to just a few.

Want enough files and entries to be able to join the three tables: 1. presentations, 2. BDA filenames, 3. BDA entries.

Do we need all three tables? We currently link date -> date from 1 to 2. We could link date -> date from 1 to 3.

zipperer commented 5 months ago

Files for experimenting:

'BDA 2014 1 01.txt'
'BDA 2014 10 05.txt'
'BDA 2015 8 12.txt'

filename repeated for each entry

date -- I can make the column type date then insert a timestamp, see e.g.

CREATE TABLE my_timestamp_table (
  id SERIAL PRIMARY KEY,
  my_timestamp TIMESTAMP,
  my_date DATE
);

INSERT INTO my_timestamp_table (my_timestamp, my_date) VALUES (
  'Mon 4 Nov 2013 23:21:49', 
  'Mon 4 Nov 2013 23:21:49');

timestamp book chapter verse

extract from structure

zipperer commented 5 months ago

Could make one csv per file. Could make one csv file and append to it for each file. Could insert directly into database from python program.

Current approach:

one csv per file
confirm file looks good
write method that writes row to database rather than writing row to file
make interface so that can write to file or write to database for a single new file
make procedure to write entries to db from all files in directory

zipperer commented 5 months ago

Columns for experimenting: filename | date | timestamp | book | chapter | verse

CREATE TABLE bible_display_app_file_entries (
  filename VARCHAR,
  filename_date DATE,
  entry_timestamp TIMESTAMP,
  book VARCHAR,
  chapter INT,
  verse INT
);

zipperer commented 5 months ago

I have

converted one .txt file into a .csv
copied the one .csv file into a db table
queried for presentation id and the timestamped entries, e.g.

ostgres=# SELECT presentations_decoded_date.id, bible_display_app_file_entries.entry_timestamp, bible_display_app_file_entries.b\
ook, bible_display_app_file_entries.chapter, bible_display_app_file_entries.verse FROM presentations_decoded_date JOIN bible_disp\
lay_app_file_entries ON bible_display_app_file_entries.filename_date = presentations_decoded_date.presentation_date;              
  id  |   entry_timestamp   |      book      | chapter | verse                                                                    
------+---------------------+----------------+---------+-------                                                                   
 3191 | 2014-01-01 00:00:25 | Romans         |       8 |    29                                                                    
 3191 | 2014-01-01 00:01:00 | Romans         |       8 |    28                                                                    
 3191 | 2014-01-01 00:01:14 | Romans         |       8 |    29                                                                    
 3191 | 2014-01-01 00:01:53 | Romans         |       8 |    29                                                                    
 3191 | 2014-01-01 00:01:54 | Romans         |       8 |    29                                                                    
 3191 | 2014-01-01 00:01:54 | Romans         |       8 |    29                                                                    
 3191 | 2014-01-01 00:05:42 | Romans         |       7 |    20
...

script that converted the .txt into a .csv: bible_display_app_file_to_csv.py

example output csv file: BDA 2014 1 01.csv

interactions for making table and copying data into table: docker_postgres_interactions_copy_into_table_and_query.txt

zipperer commented 5 months ago

Joining after adding the entries from BDA 2014 10 05.csv shows we will get duplicate rows by matching just on day and not on 'morning' or 'night'

3302 | 2014-10-05 00:00:33 | Ephesians      |       2 |    10
 3301 | 2014-10-05 00:00:33 | Ephesians      |       2 |    10
 3302 | 2014-10-05 00:01:53 | Romans         |       7 |    25
 3301 | 2014-10-05 00:01:53 | Romans         |       7 |    25
 3302 | 2014-10-05 00:01:56 | Romans         |       8 |     1

zipperer commented 5 months ago

Joining after adding the entries from BDA 2014 10 05.csv shows we will get duplicate rows by matching just on day and not on 'morning' or 'night'

See issues about splitting file into morning file and evening file for each file that has both.

zipperer commented 5 months ago

The experiments above for this issue satisfy the initial goal of getting entries into the database. So I close this issue.

Future issues will address the open questions of:

Do we plan to make a csv for each file of entries?
How will we split a file into the entries for the morning and entries for the night?
Do we need a third category 'night Q and A' in addition to 'morning' and 'night'?
Separate interfaces for (a) taking one file of entries and making a csv and (b) iterating over all files in directory and making a csv for each.

zipperer / migrate_database_with_michael

Make table with columns 1. BDA filename 2. entry from BDA filename #42