Closed A1exRey closed 1 year ago
Hi, thanks for the great collection of datasets. But it seems that not all datasets in it are correctly preprocessed. Multirc requires paragraph, question, individual answers concatenated together for a classification. But in your case you just take the first sentence (the question itself) without adding more data. In taks.py
super_glue___multirc = Classification(sentence1="question", labels="label")
And during load we get:from tasksource import list_tasks, load_task ddf = load_task('super_glue/multirc')
index sentence1 labels 0 What did the high-level effort to persuade Pakistan include? 0 1 What did the high-level effort to persuade Pakistan include? 0 2 What did the high-level effort to persuade Pakistan include? 1 3 What did the high-level effort to persuade Pakistan include? 1 4 What did the high-level effort to persuade Pakistan include? 1 This data does not make any sense, and model will not be trained in any way. Maybe you should replace the code with something similar to this to put all the data together(following the WiC example).
super_glue___multirc = Classification( sentence1=cat(["paragraph", "question","answer"], " : "), labels='label' )
I apologize for that mistake. I manually check the processed datasets (and I also trained models on them) but there might be some errors I overlooked. The last release fixes that mistake. Thanks a lot for your input.
Hi, thanks for the great collection of datasets. But it seems that not all datasets in it are correctly preprocessed. Multirc requires paragraph, question, individual answers concatenated together for a classification. But in your case you just take the first sentence (the question itself) without adding more data. In taks.py
index | sentence1 | labels -- | -- | -- 0 | What did the high-level effort to persuade Pakistan include? | 0 1 | What did the high-level effort to persuade Pakistan include? | 0 2 | What did the high-level effort to persuade Pakistan include? | 1 3 | What did the high-level effort to persuade Pakistan include? | 1 4 | What did the high-level effort to persuade Pakistan include? | 1super_glue___multirc = Classification(sentence1="question", labels="label")
And during load we get:This data does not make any sense, and model will not be trained in any way. Maybe you should replace the code with something similar to this to put all the data together(following the WiC example).