pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.5k stars 815 forks source link

How to load AG_NEWS data from local files #1316

Open robbenplus opened 3 years ago

robbenplus commented 3 years ago

How to load AG_NEWS data from local files

I can't get ag news data with train_iter, test_iter = AG_NEWS(split=('train', 'test')) online because of my bad connection. So I download the the train.csv and test.csv manually to my local folder AG_NEWS from url 'train': "https://raw.githubusercontent.com/mhjabreel/CharCnn_Keras/master/data/ag_news_csv/train.csv", 'test': "https://raw.githubusercontent.com/mhjabreel/CharCnn_Keras/master/data/ag_news_csv/test.csv"

After that I tried to load ag news data with train_iter, test_iter = AG_NEWS(root = './AG_NEWS', split=('train', 'test')), throw a exception RuntimeError: The hash of /myfolder/AG_NEWS/train.csv does not match. Delete the file manually and retry.

My file content is

myfolder
│    
└───AG_NEWS
│     └───   train.csv
│     └───   test.csv
parmeet commented 3 years ago

Hi @robbenplus, could you try by specifying your root as './myfolder'? (AG_NEWS is auto-appended to the root path by dataset internally)