wit-ai / wit

Natural Language Interface for apps and devices
https://wit.ai/
931 stars 91 forks source link

Urgent Bug - difference in unicode character encodings prevent keyword & free-text to merge results together #2702

Open idanadut opened 7 months ago

idanadut commented 7 months ago

Do you want to request a feature, report a bug, or ask a question about wit? BUG

What is the current behavior? Foreign language words that are explicitly mentioned in the entity are sometimes not tagged to that entity's keyword.

The issue is a bug that @patapizza identified in the past and wrote about it "This is indeed a bug. It looks like a difference in character encodings prevent keyword & free-text to merge results together. @patapizza and Longfang worked on it and fixed it twice in the past, last time probably a year ago, but seems Like this bug keeps coming back every few months. Here is the original issue: https://github.com/wit-ai/wit/issues/2540 (It has nothing to do with changing Entities lookup strategy!-that was suggested at first by mistake. it is the character encodings merge issue that solved it)

If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem. Try the utterance "פיפא". It is supposed to be tagged to an entity called "Version" with the value "Fifa". Try the utterance "שער צבע לבן". The word "צבע" is supposed to be tagged to an entity called "Connectors" with the value "צבע".

What is the expected behavior? Words that are explicitly mentioned in the entity should be tagged to that entity's keyword. It's urgent since words are not being tagged correctly.

If applicable, what is the App ID where you are experiencing this issue? If you do not provide this, we cannot help. 337695227605625

nomiero commented 7 months ago

Thanks for reporting the issue @idanadut I'm not sure if this is the same issue as before related to Unicode characters. I tried the utterances in this issue and the original one as well. Looks like the utterance that is not behaving as expected is "שער צבע לב"

Seems like we can't recognize שער צבע לבן as described, but we seem to be able to recognize פיפא, can you share what is the response you get for the first utterance ?

For the second utterance, can you try adding it to the training utterances and see if that helps. (Note that the way we create the free text model for your app means that we try to fit the model across all utterances but doesn't guarantee that we will be able to handle all of them with the output model.

idanadut commented 7 months ago

Hi @nomiero , 1) For "פיפא", in your UI it gives entity="Brand"(50%) with the value "פיפא". It should be "Version" with the value "Fifa" (since it's explicitly stated there). So I believe this doesn't work well as well. For other examples, I had many many, which I also had added as utterances in the past and still wasn't tagged correctly - seems like they are tagged more correctly than before - seems like something was changed? Anyways, still many examples don't work as expected - for examples try "סונוס". Should be tagged as "Brand"="Sonos" (since its explicitly there), and it is tagged as entity "Version" (with 51%).

You have any solution for this?

-For "שער צבע לבן" , now it is tagged differently than it had yesterday and the days before. did you change anything? I would still prefer "צבע" is tagged as Entity "Connector" since there are many such similar utterances but now I cannot complain since its tagged to another entity that also has this keyword. After all that - I now added this specific utterance and waited it would finish training (tagging as "Connector"), and "צבע" is still tagged as Entity "ItemType". If you think this is also a bug, let me know, but again, now its at least going to an entity that does have this keyword (which it hadn't before).

Of course , I can't be sure, might be something different - But please do verify again if this is that same to Unicode characters issue, which @patapizza saw in the past. Since, it happened 2-3 times in the past and it seems that the same type of examples and also same examples were not working.

Thanks!

nomiero commented 7 months ago

Glad we got some improvement. Yes, I mainly retrained the model to make sure that nothing went wrong with the last training for your app. Seems like this helped a bit.

For the examples not resolved yet. I'm not sure if we can easily improve these. To check if this is not related to the training examples, maybe you can have a test app and add small number of training examples for the ones that are failing and see if it resolves correctly. I will also do some investigation to try to understand the examples you mentioned a bit more.

idanadut commented 7 months ago

Hi @nomiero ,

Thanks. It's not related to the training examples I have - Just try this: "סונוס". It Should be tagged as Entity "Brand"="Sonos" (since its explicitly there), and it is tagged as entity "Version" (with confidence 51%). When keywords are explicitly mentioned it should be tagged to them, right? It seems to affect some specific keywords in some entities to not be tagged.

BTW, Unicode characters issue I mentioned ("a difference in character encodings prevent keyword & free-text to merge results together"), did the same exact thing.

idanadut commented 7 months ago

Hi @nomiero

Any updates? this urgent issue is still ongoing Thanks!

ChrisyShine commented 7 months ago

For the utterance "שער צבע לבן" and the word "צבע", the problem is: The word "צבע" has been registered as Keyword/Synonyms in both entities:

  1. ItemType with Keyword "Crayon"
  2. Connectors with Keyword "צבע"

Our service predicts both entities with the keyword lookup strategy. The 2 prediction results have the same priority, so the final confidences are the same. It means that there is no way to tell which prediction result is preferred. If you want to fix this specific issue, simply remove "צבע" from the Entity "ItemType" will work.

And for other similar issues, can you review all the entities definitions and see if there are similar duplicated annotations for the wrong words?

idanadut commented 7 months ago

Hi @ChrisyShine , You're right, I mentioned this in one of my last comments, I'm not sure if that's a good example. But the other ones (״פיפא״, "סונוס״), and I have many more, don't have similar duplicates..

ChrisyShine commented 7 months ago

For other ones (״פיפא״, "סונוס״), we need to do some fix in our service. We will update when the fix is ready.

Tahatahdish commented 1 week ago

Do you want to request a feature, report a bug, or ask a question about wit? BUG

What is the current behavior? Foreign language words that are explicitly mentioned in the entity are sometimes not tagged to that entity's keyword.

The issue is a bug that @patapizza identified in the past and wrote about it "This is indeed a bug. It looks like a difference in character encodings prevent keyword & free-text to merge results together. @patapizza and Longfang worked on it and fixed it twice in the past, last time probably a year ago, but seems Like this bug keeps coming back every few months. Here is the original issue: #2540 (It has nothing to do with changing Entities lookup strategy!-that was suggested at first by mistake. it is the character encodings merge issue that solved it)

If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem. Try the utterance "פיפא". It is supposed to be tagged to an entity called "Version" with the value "Fifa". Try the utterance "שער צבע לבן". The word "צבע" is supposed to be tagged to an entity called "Connectors" with the value "צבע".

What is the expected behavior? Words that are explicitly mentioned in the entity should be tagged to that entity's keyword. It's urgent since words are not being tagged correctly.

If applicable, what is the App ID where you are experiencing this issue? If you do not provide this, we cannot help. 337695227605625