[x] Goal: In "justifications_compile.py", pull Reference # along with the page capture coordinates for each individual page capture (from the raw text column). At this point, the current code is only recognizing / pulling page captures that appeared as the FIRST reference for a given justification in a given document. In other words. In other words, it is not pulling page captures that appear as the second or later reference for that justification on that page.
[x] Goal: Figure out how to rename .pdf files in bash/shell. Some are improperly formatted as: PREM_15_478_123.pdf, PREM_15_1689_123.pdf, or PREM_15_1010_123.pdf. They SHOULD be formatted as: IMG_0123_PREM_15_478. In other words, move the last three digits to the start of the file name, following "IMG_0" (etc.). EXAMPLE: PREM_15_1010_145.pdf should be re-named as: IMG_0145_PREM_15_1010.pdf
[x] Goal: Write a script to transform all pdf documents (6958 individual pdfs) as .txt files. These documents are stored on Google Drive:
Sarah lead:
[x] Download all date codes from Nvivo as individual .txt files. Adopt/update Jose's loop from justification code to append them all onto a single document. Then, we will merge these with the complete corpus.
Today, Sarah created a new branch (0121_txt_management), updated the .py code for our justification .txt output "justifications_compile.py", and output a .csv file "justifications_long_parsed.csv".
Jose lead:
[x] Goal: In "justifications_compile.py", pull Reference # along with the page capture coordinates for each individual page capture (from the raw text column). At this point, the current code is only recognizing / pulling page captures that appeared as the FIRST reference for a given justification in a given document. In other words. In other words, it is not pulling page captures that appear as the second or later reference for that justification on that page.
[x] Goal: Figure out how to rename .pdf files in bash/shell. Some are improperly formatted as: PREM_15_478_123.pdf, PREM_15_1689_123.pdf, or PREM_15_1010_123.pdf. They SHOULD be formatted as: IMG_0123_PREM_15_478. In other words, move the last three digits to the start of the file name, following "IMG_0" (etc.). EXAMPLE: PREM_15_1010_145.pdf should be re-named as: IMG_0145_PREM_15_1010.pdf
[x] Goal: Write a script to transform all pdf documents (6958 individual pdfs) as .txt files. These documents are stored on Google Drive:
Sarah lead: