zipperer / migrate_database_with_michael

0 stars 0 forks source link

Make table with columns 1. BDA filename 2. entry from BDA filename #42

Closed zipperer closed 5 months ago

zipperer commented 5 months ago

Iterate through BDA files. For each BDA file, for each entry in file, make row: FILENAME, DAY, TIMESTAMP, BOOK, CHAPTER, VERSE.

While experimenting, maybe restrict to a few files. Determine what to do for all files, but restrict tests to just a few.

Want enough files and entries to be able to join the three tables: 1. presentations, 2. BDA filenames, 3. BDA entries.

Do we need all three tables? We currently link date -> date from 1 to 2. We could link date -> date from 1 to 3.

zipperer commented 5 months ago

Files for experimenting:

Columns for experimenting: filename | date | timestamp | book | chapter | verse

filename repeated for each entry

date -- I can make the column type date then insert a timestamp, see e.g.

CREATE TABLE my_timestamp_table (
  id SERIAL PRIMARY KEY,
  my_timestamp TIMESTAMP,
  my_date DATE
);

INSERT INTO my_timestamp_table (my_timestamp, my_date) VALUES (
  'Mon 4 Nov 2013 23:21:49', 
  'Mon 4 Nov 2013 23:21:49');

timestamp book chapter verse

zipperer commented 5 months ago

Could make one csv per file. Could make one csv file and append to it for each file. Could insert directly into database from python program.

Current approach:

zipperer commented 5 months ago

Columns for experimenting: filename | date | timestamp | book | chapter | verse

CREATE TABLE bible_display_app_file_entries (
  filename VARCHAR,
  filename_date DATE,
  entry_timestamp TIMESTAMP,
  book VARCHAR,
  chapter INT,
  verse INT
);
zipperer commented 5 months ago

I have

ostgres=# SELECT presentations_decoded_date.id, bible_display_app_file_entries.entry_timestamp, bible_display_app_file_entries.b\
ook, bible_display_app_file_entries.chapter, bible_display_app_file_entries.verse FROM presentations_decoded_date JOIN bible_disp\
lay_app_file_entries ON bible_display_app_file_entries.filename_date = presentations_decoded_date.presentation_date;              
  id  |   entry_timestamp   |      book      | chapter | verse                                                                    
------+---------------------+----------------+---------+-------                                                                   
 3191 | 2014-01-01 00:00:25 | Romans         |       8 |    29                                                                    
 3191 | 2014-01-01 00:01:00 | Romans         |       8 |    28                                                                    
 3191 | 2014-01-01 00:01:14 | Romans         |       8 |    29                                                                    
 3191 | 2014-01-01 00:01:53 | Romans         |       8 |    29                                                                    
 3191 | 2014-01-01 00:01:54 | Romans         |       8 |    29                                                                    
 3191 | 2014-01-01 00:01:54 | Romans         |       8 |    29                                                                    
 3191 | 2014-01-01 00:05:42 | Romans         |       7 |    20
...

script that converted the .txt into a .csv: bible_display_app_file_to_csv.py

example output csv file: BDA 2014 1 01.csv

interactions for making table and copying data into table: docker_postgres_interactions_copy_into_table_and_query.txt

zipperer commented 5 months ago

Joining after adding the entries from BDA 2014 10 05.csv shows we will get duplicate rows by matching just on day and not on 'morning' or 'night'

3302 | 2014-10-05 00:00:33 | Ephesians      |       2 |    10
 3301 | 2014-10-05 00:00:33 | Ephesians      |       2 |    10
 3302 | 2014-10-05 00:01:53 | Romans         |       7 |    25
 3301 | 2014-10-05 00:01:53 | Romans         |       7 |    25
 3302 | 2014-10-05 00:01:56 | Romans         |       8 |     1
zipperer commented 5 months ago

Joining after adding the entries from BDA 2014 10 05.csv shows we will get duplicate rows by matching just on day and not on 'morning' or 'night'

See issues about splitting file into morning file and evening file for each file that has both.

zipperer commented 5 months ago

The experiments above for this issue satisfy the initial goal of getting entries into the database. So I close this issue.

Future issues will address the open questions of: