tejasph / care_nlp_psych

Cancer Ambulatory Resource Enhancement through early prediction of ER visits
MIT License
0 stars 0 forks source link

Bow Content Processing #16

Closed tejasph closed 9 months ago

tejasph commented 9 months ago

Building the bow network from the ground up, using the models.bow main.py.

Started by running the tokenizer via data_processing, which is a new folder I added for clarity (previously defined as data in JJ's project). The data project in this repo now pertains only to datasets, but is included in the .gitignore for privacy reasons.

BoW vectorization works with the toy data, and the environment yml has been updated. The .sh scripts work perfectly using Mac, as the .bats were incompatible. I also had trouble using JJ's pretrained vectorizers which was unfortunate. I'm not sure why yet...

I will still look to add comments to the file for future clarity.