salesforce / decaNLP

The Natural Language Decathlon: A Multitask Challenge for NLP
BSD 3-Clause "New" or "Revised" License
2.34k stars 471 forks source link

Loading datasets #26

Closed andreamad8 closed 6 years ago

andreamad8 commented 6 years ago

While running this command as suggested in the repo: nvidia-docker run -it --rm -v `pwd`:/decaNLP/ -u $(id -u):$(id -g) decanlp bash -c "python /decaNLP/train.py --train_tasks squad iwslt.en.de cnn_dailymail multinli.in.out sst srl zre woz.en wikisql schema --train_iterations 1 --gpu 1" It can download and process all the following datasets: squad iwslt.en.de cnn_dailymail multinli.in.out sst srl zre. Once it arrives to woz.en I have the following error:

process_main - zre has 840000 training examples process_main - Loading woz.en process_main - Adding woz.en to training datasets downloading woz_train_en.json downloading woz_test_de.json downloading woz_test_en.json downloading woz_train_de.json downloading woz_validate_de.json downloading woz_validate_en.json Traceback (most recent call last): File "/decaNLP/train.py", line 365, in main() File "/decaNLP/train.py", line 352, in main field, train_sets, val_sets = prepare_data(args, field, logger) File "/decaNLP/train.py", line 67, in prepare_data split = get_splits(args, task, FIELD, kwargs)[0] File "/decaNLP/util.py", line 138, in get_splits fields=FIELD, root=args.data, kwargs) File "/decaNLP/text/torchtext/datasets/generic.py", line 976, in splits os.path.join(path, f'{train}.jsonl'), fields, **kwargs) File "/decaNLP/text/torchtext/datasets/generic.py", line 897, in init ex = data.Example.fromlist([context, question, answer, CONTEXT_SPECIAL, QUESTION_SPECIAL, context_question, woz_id], fields) File "/decaNLP/text/torchtext/data/example.py", line 62, in fromlist setattr(ex, name, [sys.intern(x) for x in field.preprocess(val)]) TypeError: 'int' object is not iterable

Thanks in advance

Andrea

bmccann commented 6 years ago

Looks like I broke this recently trying to save on memory. I’ll have this fixed today.

bmccann commented 6 years ago

847a9dd0947a6360ce41cb480a83a3b732ea7894 should fix this -- let me know if not

andreamad8 commented 6 years ago

Issued solved. Thank you a lot for the quick response.