🙅 Inaccurate model coref predictions master thread

svlandeg commented 4 years ago

Master thread for collecting incorrect and/or problematic coreference predictions with the pretrained models. These can be interesting test cases when training the next version of the model.

petulla commented 4 years ago

updated to include article url, doh *

For this article, the model struggles with NASA's James Webb Space Telescope.

This is the mentions array:

[Mauna Kea: [Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, it, Mauna Kea, Mauna Kea], Hawaii: [Hawaii, Hawaii, Hawaii, Hawaii, Hawaii, its, Hawaii, Hawaii, Hawaii, Hawaii], Spain: [Spain, Spain, Spain, Spain, Spain, Spain], Earth: [Earth, Earth, Earth, Earth], astronomers: [astronomers, their], the world's largest telescope in Hawaii: [the world's largest telescope in Hawaii, the telescope, the telescope, it, the Webb telescope, the telescope, it, Thirty Meter Telescope, Its, the telescope on La Palma, the telescope in Spain, this telescope], the islands': [the islands', their], Meter Telescope officials: [Meter Telescope officials, their], their backup site atop a peak on the Spanish Canary island of La Palma: [their backup site atop a peak on the Spanish Canary island of La Palma, it, it, the site], La Palma: [La Palma, La Palma, La Palma, La Palma, La Palma, La Palma, La Palma, La Palma, La Palma, La Palma], Mauna Kea's: [Mauna Kea's, Mauna Kea's], Bolte, who has used existing Mauna Kea telescopes: [Bolte, who has used existing Mauna Kea telescopes, he], Bolte: [Bolte, Bolte, Bolte, The telescope group's Bolte], Webb: [Webb, Webb], Mather: [Mather, He, Mather, he], bright stars: [bright stars, them], Loeb: [astrophysicist Avi Loeb, who chairs Harvard University's astronomy department, Loeb, Loeb, he], The Native Hawaiian opponents: [The Native Hawaiian opponents, themselves, their, They], the telescope group: [the telescope group, The telescope group], protest leader Kealoha Pisciotta: [protest leader Kealoha Pisciotta, Pisciotta], Thirty Meter Telescope officials: [Thirty Meter Telescope officials, they], the Canary Islands: [the Canary Islands, the Canary Islands, the Canary Islands], Others: [Others, their, their], Jos Manuel Vilchez, an astronomer with Spain's Higher Council of Scientific Research and a former member of the scientific committee of the Astrophysics Institute of the Canary Islands: [Jos Manuel Vilchez, an astronomer with Spain's Higher Council of Scientific Research and a former member of the scientific committee of the Astrophysics Institute of the Canary Islands, We, We], Vilchez: [Vilchez, Vilchez, Vilchez, Vilchez], Native Hawaiians: [Native Hawaiians, their, they, Native Hawaiians]]

Webb is broken out as if it is a last name when it is the part of the telescope's name. In general the model struggles to tell the difference between the two telescopes mentioned in the article.

I'm wondering if Bert Span-based model might be an option for the next release? I tried the above text on it and it is slightly better (though still imperfect). https://github.com/mandarjoshi90/coref

Atul-Anand-Jha commented 4 years ago

Hey @svlandeg , I have observed that the live demo on https://huggingface.co/coref/ is surprisingly correct at multiple cases, when My local implementation of the same model fails. To ensure this furthermore, I have tried to resolve co-reference with both of the models, latest one - neuralcorefv4.0 and neuralcoref-lg-3.0.0. And, our results are way poorer than your live demo. I am attaching a screenshot for your better understanding of the situation. Have a look, and please respond. Does the live demo implements a different model than both of the above mentioned ones. if So; How can we get them to implement in our project.?????

our results Fig: our implementation results ( neuralcoref v3 + neuralcored v4)

live demo results Fig: your live demo result

EvanFabry commented 4 years ago

Hey @svlandeg , I have observed that the live demo on https://huggingface.co/coref/ is surprisingly correct at multiple cases, when My local implementation of the same model fails. To ensure this furthermore, I have tried to resolve co-reference with both of the models, latest one - neuralcorefv4.0 and neuralcoref-lg-3.0.0. And, our results are way poorer than your live demo. I am attaching a screenshot for your better understanding of the situation. Have a look, and please respond. Does the live demo implements a different model than both of the above mentioned ones. if So; How can we get them to implement in our project.?????

Fig: our implementation results ( neuralcoref v3 + neuralcored v4)

Fig: your live demo result

+1. I've noticed discrepancies between performance locally and in the dev environment. @svlandeg @thomwolf, can you comment on what exactly is currently served by the demo environment?

svlandeg commented 4 years ago

I wasn't involved with this project when the demo environment was created. However, note that it's not just the version that was trained that makes a difference, but also the specific hyperparameters used for making the predictions. So that is definitely something you can "play" with too.

Atul-Anand-Jha commented 4 years ago

I wasn't involved with this project when the demo environment was created. However, note that it's not just the version that was trained that makes a difference, but also the specific hyperparameters used for making the predictions. So that is definitely something you can "play" with too.

Thanks, I actually tried different options for these Hyper-parameters, But, none of the model release' uploaded here could match the demo one.

aereobert commented 4 years ago

Same here.

With exactly the same sentence provided from the sample site, I tried all kinds of options for hyperparameters, but I am still unable to reproduce the result. I installed Spacy and Neuralcoref from source on a brand new Docker, so it should not be a problem of environment.

In the sample page, the score is usually like 3 to 15, where on my environment the result is always like -2 to 2.

I am wondering how exactly to reproduce the result on the sample page.

Thank you very much!

@svlandeg @thomwolf

resolved by compiling spacy 2.1.0 and neuralcoref from source code.

aamin3 commented 4 years ago

Same here.

With exactly the same sentence provided from the sample site, I tried all kinds of options for hyperparameters, but I am still unable to reproduce the result. I installed Spacy and Neuralcoref from source on a brand new Docker, so it should not be a problem of environment.

In the sample page, the score is usually like 3 to 15, where on my environment the result is always like -2 to 2.

I am wondering how exactly to reproduce the result on the sample page.

Thank you very much!

@svlandeg @thomwolf

resolved by compiling spacy 2.1.0 and neuralcoref from source code.

hello, you confirm that the demo results can be achieved by compiling spacy 2.1.0 and neuralcoref from source code?

aereobert commented 4 years ago

Same here. With exactly the same sentence provided from the sample site, I tried all kinds of options for hyperparameters, but I am still unable to reproduce the result. I installed Spacy and Neuralcoref from source on a brand new Docker, so it should not be a problem of environment. In the sample page, the score is usually like 3 to 15, where on my environment the result is always like -2 to 2. I am wondering how exactly to reproduce the result on the sample page. Thank you very much! @svlandeg @thomwolf resolved by compiling spacy 2.1.0 and neuralcoref from source code.

hello, you confirm that the demo results can be achieved by compiling spacy 2.1.0 and neuralcoref from source code?

Not exactly. I am just saying that this would increase the accuracy on my side, from unusable to usable.

cfoster0 commented 4 years ago

Surprised to see the following.

On the example sentence in the README, neuralcoref predicts accurately:

But on a slight modification, where we switch sister to brother, and swap the pronouns, we get an incorrect prediction on the second sentence:

noelslice commented 4 years ago

It would be very helpful if someone could shed some light on what model and combination of package versions are used in the demo environment. Like others here I'm not able to reproduce what I see on the demo in my local setup, even when rolling spacy back to 2.1.3 and building neuralcoref from source. It feels like the model served in the demo environment is a different model or it was trained with different word embeddings. Are pretrained neuralcoref models tied to any specific spacy language model tag? I've played with the parameters like others here with some improvement but I'm still seeing systematic differences with the live demo.

pborysov commented 4 years ago

It would be very helpful if someone could shed some light on what model and combination of package versions are used in the demo environment. Like others here I'm not able to reproduce what I see on the demo in my local setup, even when rolling spacy back to 2.1.3 and building neuralcoref from source. It feels like the model served in the demo environment is a different model or it was trained with different word embeddings. Are pretrained neuralcoref models tied to any specific spacy language model tag? I've played with the parameters like others here with some improvement but I'm still seeing systematic differences with the live demo.

Totally agree!!! Online demo is an ideal starting point, but only if it is reproducible :(

Keating950 commented 4 years ago

I'm not able to share much in the way of text for confidentiality reasons, but I'm noticing that the pre-trained model seems to be gravitating toward resolving "us" to "We." It might be useful to be able to blacklist certain words (e.g. "We") as never being satisfactory coreferents.

\< It is not up to us to rectify things
\---
\> It is not up to We to rectify things

\< It is absolutely an issue, but not only to us
\---
\> It is absolutely an issue, but not only to We

neuralcoref 4.0
spacy 2.3.2

aamin3 commented 4 years ago

I agree- to have a more customizable blacklist (including they, it, these, who) would be wonderful. this is great tech as it is but just a suggestion

On Mon, Jul 27, 2020 at 2:57 PM Keating950 notifications@github.com wrote:

I'm not able to share much in the way of text for confidentiality reasons, but I'm noticing that the pre-trained model seems to be gravitating toward resolving "us" to "We." It might be useful to be able to blacklist certain words (e.g. "We") as never been satisfactory coreferents.

\< It is not up to us to rectify things --- > It is not up to We to rectify things

\< It is absolutely an issue, but not only to us --- > It is absolutely an issue, but not only to We

neuralcoref 4.0

spacy 2.3.2

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/huggingface/neuralcoref/issues/215#issuecomment-664577814, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALGMOQXWSF373JA2LBJJMJDR5XEXZANCNFSM4JBIG4EQ .

Keating950 commented 4 years ago

I agree- to have a more customizable blacklist (including they, it, these, who) would be wonderful. this is great tech as it is but just a suggestion

If you're interested in this feature, I've added to in my fork of this project. I'm still making sure it works, so I'm all ears to any feedback and review.

aamin3 commented 4 years ago

Thanks alot Keating950!

I see that in your fork NO_COREF_LIST = ["i", "me", "my", "you", "your"] no longer exists in train/document.py, nor in neuralcoref.pyx. So to me it seems we do not place our custom blacklist directly in the source code. Does that mean each time neuralcoref is instantiated I just pass the custom blacklist? for example:

coref = neuralcoref.NeuralCoref(nlp.vocab, greedyness=0.75 , blacklist = ["i", "me", "my", "you", "your", "they", "their", "it"] )

Just verify I implement your fork as intended. Thanks

On Sun, Aug 9, 2020 at 3:21 PM Keating950 notifications@github.com wrote:

I agree- to have a more customizable blacklist (including they, it, these, who) would be wonderful. this is great tech as it is but just a suggestion

If you're interested in this feature, I've added it in my fork https://github.com/Keating950/neuralcoref of this project. I'm still making sure it works, so I'm all ears to any feedback and review.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/huggingface/neuralcoref/issues/215#issuecomment-671090891, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALGMOQQCPSE3FI3ZWR7F6U3R73ZL5ANCNFSM4JBIG4EQ .

Keating950 commented 4 years ago

Yup, that's exactly right. I've updated the README. Feel free to open an issue on that repo if you have any other questions.

aamin3 commented 4 years ago

Thanks alot man.

On Sun, Aug 9, 2020 at 10:31 PM Keating950 notifications@github.com wrote:

Yup, that's exactly right. I've updated the README. Feel free to open an issue on that repo if you have any other questions.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/huggingface/neuralcoref/issues/215#issuecomment-671138129, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALGMOQTPEWADAXYJ75QCNWLR75LX5ANCNFSM4JBIG4EQ .

lauwauw commented 3 years ago

Thanks for your work @Keating950! Very helpful!

Keating950 commented 3 years ago

@lauwauw Thanks! I've merged in the latest changes from this repo in light of the renewed interest.

huggingface / neuralcoref

🙅 Inaccurate model coref predictions master thread #215