Closed Marwa199527 closed 3 years ago
context[92:96] = 1854
. However after the tokenization process, the tokens are as follows,
The calculated tokens do not capture 1854 separately because it is immediately followed by a hyphen. Hence the end index of our calculated span does not match the ground truth span. Most of the errors are due to issues like these. Thank you very much @kushalj001
Thank you @kushalj001 for your great work, it really helped me. What I want to ask you and I really hope that you reply to me as soon as possible. Am using an Arabic data set which is in squad format there are almost 2300 answers were dropped because they have a wrong start index which is a lot. 1) How can I fix the start index in my data. 2) I did not understand how your function found these errors in the start index. 3) And I know that you explain why this problem happened, but I did not really understand the cause of it. Please answer the above questions, I will be really grateful if you did.