mhjabreel / CharCNN

MIT License
234 stars 99 forks source link

data source csvs #17

Open davecampbell opened 3 years ago

davecampbell commented 3 years ago

do the csvs happen to be available anywhere - public dataset somewhere? https://github.com/mhjabreel/CharCNN/tree/master/data/ag_news_csv

they are mentioned here: https://github.com/johnb30/py_crepe

i could reassemble them given the text files in /data, but would hope to not introduce some oddity by mishandling double-quotes or something like that.

mhjabreel commented 3 years ago

Hi, You can find the datasets in HuggingFace datasets. Please check this url.

https://huggingface.co/docs/datasets/loading_datasets.html

davecampbell commented 3 years ago

thank you SO much - what a great resource! looks like the py_crepe project was reading data files before the title and description were combined, so i can adjust that helper code to use the more recent format. i'm trying to see if i can get that to run for me - and then progress to this repo to fully appreciate what you have here.

i hope to use it on a non-nlp project related to gene sequences, but i am just an ML beginner so i have to take everything step-by-step.