johannespischinger / senti_anal

MIT License
2 stars 0 forks source link

bug loading dataset from huggingface google drive #68

Closed max-27 closed 2 years ago

max-27 commented 2 years ago

When we use the training_pl.py inside a docker container the cache files from the dataset are not working since they are dependent on the local path: File "/usr/local/lib/python3.9/site-packages/datasets/utils/info_utils.py", line 40, in verify_checksums raise NonMatchingChecksumError(error_msg + str(bad_urls)) datasets.utils.info_utils.NonMatchingChecksumError: Checksums didn't match for dataset source files: ['https://drive.google.com/u/0/uc?id=0Bz8a_Dbh9QhbaW12WVVZS2drcnM&export=download'] I would suggest to save the train, val and test datasets in the processed folder as pickle files and load them with torch.load()

michaelfeil commented 2 years ago

resolved with dvc