skdreier / NIrelandNLP

British Justifications for Internment without Trial: NLP Approaches to Analyzing Government Archives (Ongoing Project)
MIT License
1 stars 1 forks source link

Tasks as of 01/21 afternoon #8

Closed skdreier closed 4 years ago

skdreier commented 4 years ago

Today, Sarah created a new branch (0121_txt_management), updated the .py code for our justification .txt output "justifications_compile.py", and output a .csv file "justifications_long_parsed.csv".

Jose lead:

Sarah lead:

jmhernan commented 4 years ago

@skdreier Here is the commands that will convert all pdf files into .txt files.

  1. In your terminal and your conda enviroment download the appropriate module poppler:
    conda install -c conda-forge poppler
  2. Go into the directory with all the pdf files (we might want to try on a subset) and execute this command:
    for file in *.pdf; do pdftotext "$file"; done

    And magic!...Hopefully