Closed YousefGh closed 3 years ago
thanks for the catch, so for your data just uncommenting ['data'] fixed this issue right? I think I had to modify this for some special json files and forgot to fix it again.
Yes, it will fix the issue for any SQuAD like JSON. As it changes the iterator from dictionary (through keys) to an array (through articles inside 'data'
).
This is my validator:
a = QustionAnsweringJSON("a.json")
b = QustionAnsweringJSON("b.json")
ab = QustionAnsweringJSON("turk_combined_all.json")
a.show_info()
b.show_info()
ab.show_info()
Which outputs:
Number of Articles: 78
Number of Paragraphs: 234
Number of Questions: 702
Number of Answers: 702
Number of Articles: 77
Number of Paragraphs: 231
Number of Questions: 693
Number of Answers: 693
Number of Articles: 155
Number of Paragraphs: 465
Number of Questions: 1395
Number of Answers: 1395
thanks for this Yousef! I put a notice and fixed it
When using
combine_json_files
function atSOQAL.data_helpers.data_split
with a SQuAD-like JSON format (DrQA format B), it will produce a JSON file that its'data'
name will have the keys of both files only as the'data'
value. This is due to this commented line part incombine_json_files
function followed by a loop that iterates through dictionary keys (That's what happened when iterating through a dictionary):The loop is not iterating through articles but is equal to doing this
for key in data
where data is actually the JSON root which will produce: