roholazandie / EmpTransfo

27 stars 3 forks source link

Topic key error when i try to train with the train_multihead.py script #3

Open Nkonstan opened 3 years ago

Nkonstan commented 3 years ago

File "train_multihead.py", line 202, in train train_loader, val_loader, train_sampler, valid_sampler = get_data_loaders(config, tokenizer) File "train_multihead.py", line 118, in get_data_loaders topic = dialog["topic"] KeyError: 'topic'

It doesn't exist 'topic' key in the format dataset you propose from what i understand ?

roholazandie commented 3 years ago

Make sure you are using the changed format of the dataset and not the original one. The changed one is here: https://drive.google.com/open?id=1T4AdY7wku8srL_xWSxgt-OHqdLFVo3s3

if you are using this you will have the 'topic' key

Nkonstan commented 3 years ago

Thanks for your answer, i confirm that i use this format you sent me, but it doesn't exist a "topic" key. You can print the json file to check it too.

roholazandie commented 2 years ago

Yes, You are right, I didn't upload that dataset. here is it: https://drive.google.com/file/d/17nL6q3eiG4IKAZe-CregN5db4eGkgz4G/view?usp=sharing

let me know if this one works for you. I also update the readme file accordingly.