EDA folder which has scripts for, well you guessed it, Exploratory Data Analysis
extraction has consolidated scripts for extraction.
packaging has scripts used for tarring, uploading, getting links, filtering, etc (all using multiprocessing over different languages)
There is a new function added to embedding/word_extraction.py but this can be consolidated into the older one. I made a new one to not disturb any existing code.
Updated .gitignore to avoid pickles, pngs, pdfs, etc.
Changes:
EDA
folder which has scripts for, well you guessed it, Exploratory Data Analysisextraction
has consolidated scripts for extraction.packaging
has scripts used for tarring, uploading, getting links, filtering, etc (all using multiprocessing over different languages)embedding/word_extraction.py
but this can be consolidated into the older one. I made a new one to not disturb any existing code.Updated .gitignore to avoid pickles, pngs, pdfs, etc.