Open vidyap-xgboost opened 4 years ago
Hi @vidyap-xgboost , this is a known issue and I'm working on the fix, see issue #11 . Sorry for inconvenience. Will let you know when it's fixed
Thanks for the heads-up. Will look forward to the fix.
For the time being:
if answer_text in sent: ans_start_idx = sent.index(answer_text) else: continue
will allow you to bypass the ValueError by skipping mismatched answer/sent pairs. in a test, it did continue to match answer/sent pairs beyond where I was getting an error. it had been throwing an error after (3) matches, and with the if statement it completed the task with a total of (10) answer/question pairs. not a fix, but will let you process an entire, complex text.
i was using James Baldwin's essay, If Black English Isn't a Language, Then Tell Me, What Is?, specifically for the purpose of stress testing. you can find it here:
(apologies for the code format, i couldn't get it to break by line)
Hi @vidyap-xgboost , this is a known issue and I'm working on the fix, see issue #11 . Sorry for inconvenience. Will let you know when it's fixed
Hi @patil-suraj is this issue fixed now ?
Hi @vidyap-xgboost , this is a known issue and I'm working on the fix, see issue #11 . Sorry for inconvenience. Will let you know when it's fixed
Hi @patil-suraj is this issue fixed now ?
i would suggest using the if statement to work around. the issue is that sometimes the answer order gets out of place, the answer span and answer are not exact (one is Capital and the other is capital), or it actually creates an answer that does not appear the text. one of these is an issue with ordering what comes out of the I/O, one can be fixed by adding a .lower() to sent.index() and answer_text, and the last is an issue inside the model itself. thus the if statement is the only thing that will bypass all three errors. using the .lower() with the if statement will ensure you still get answers that appear in the text, but do not match in capitalization.
added - there is a more rare occurrence where an (s) gets added or dropped from the answer span.
When you use string_object.index(substring), it looks for the occurrence of substring in the string_object. If substring is present, the method returns the index at which the substring is present, otherwise, it throws ValueError: substring not found.
Using Python’s “in” operator
The simplest and fastest way to check whether a string contains a substring or not in Python is the “in” operator . This operator returns true if the string contains the characters, otherwise, it returns false .
str="Hello, World!"
print("World" in str)//output is True
Python “in” operator takes two arguments, one on the left and one on the right, and returns True if the left argument string is contained within the right argument string. It is important to note that the “in” operator is case sensitive i.e, it will treat the Uppercase characters and Lowercase characters differently.
I ran the following code on colab:
and I get the following error:
I don't understand why this error comes when I gave proper text.
This also occurred with