stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.67k stars 2.7k forks source link

SUTime output differs with Java version and stanford-corenlp version #1137

Open kwalcock opened 3 years ago

kwalcock commented 3 years ago

Hi,

While running unit tests on some code that uses SUTime, we noticed that all tests passed with Java 1.8 but that one failed with Java 11.  In both cases we were using the same stanford-corenlp 3.9.2.  We tracked it down to a discrepancy in SUTime, which produces different output for the different Java versions.  The example SUTimeDemo program from https://nlp.stanford.edu/software/sutime.shtml produces for the sentence

The Food and Agriculture Organization of the United Nations (FAO), the United Nations Children's Fund (UNICEF) and the World Food Programme (WFP) stressed that while the deteriorating situation coincides with an unusually long and harsh annual lean season, when families have depleted their food stocks and new harvests are not expected until August, the level of food insecurity this year is unprecedented.

output with Java 8 that includes the word "annual".  Java 11 output is without it.

annual [from char offset 237 to 243] --> P1Y
August [from char offset 343 to 349] --> 2013-08
this year [from char offset 380 to 389] --> 2013

For what it is worth, if the same test is run using standord-corenlp 4.2.0, "annual" appears in the output of both runs.  However, we are reluctant to switch to the newer version because of other changes that it has.

Does anyone know the reason for this behavior?  Is there something we can backport to the older version so that we can get consistent output?

Thank you,

AngledLuffa commented 3 years ago

I don't want to be discouraging, but there is basically zero chance we do any work on 3.9.2 except possibly for a corporate licensee. Is there some specific concern you have regarding 4.2.0?

On Thu, Feb 18, 2021, 12:39 PM Keith Alcock notifications@github.com wrote:

Hi,

While running unit tests on some code that uses SUTime, we noticed that all tests passed with Java 1.8 but that one failed with Java 11. In both cases we were using the same stanford-corenlp 3.9.2. We tracked it down to a discrepancy in SUTime, which produces different output for the different Java versions. The example SUTimeDemo program from https://nlp.stanford.edu/software/sutime.shtml produces for the sentence

The Food and Agriculture Organization of the United Nations (FAO), the United Nations Children's Fund (UNICEF) and the World Food Programme (WFP) stressed that while the deteriorating situation coincides with an unusually long and harsh annual lean season, when families have depleted their food stocks and new harvests are not expected until August, the level of food insecurity this year is unprecedented.

output with Java 8 that includes the word "annual". Java 11 output is without it.

annual [from char offset 237 to 243] --> P1Y

August [from char offset 343 to 349] --> 2013-08

this year [from char offset 380 to 389] --> 2013

For what it is worth, if the same test is run using standord-corenlp 4.2.0, "annual" appears in the output of both runs. However, we are reluctant to switch to the newer version because of other changes that it has.

Does anyone know the reason for this behavior? Is there something we can backport to the older version so that we can get consistent output?

Thank you,

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1137, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWNGFNI6TIFMX36WSZTS7V3JRANCNFSM4X27PJUQ .

kwalcock commented 3 years ago

We are evaluating performance of the newer version to see what kind of problems we might encounter. However, it would be good to know the reason for the strange behavior in the older version in case that it has not been fully addressed. It's only by chance that we noticed it failing in one way and maybe just as much by chance that the problem went away.

AngledLuffa commented 3 years ago

The most likely explanation is that since the models have been retrained, there is a different recognition of a particular word sequence leading to different labels.

You can see the release history here:

https://stanfordnlp.github.io/CoreNLP/history.html

kwalcock commented 3 years ago

There may be some kind of issue for us with tag sets, but I don't know the details yet. Retraining explains differences between versions 3.9.2 and 4.2.0, but it doesn't explain why 3.9.2 with Java 8 differs from 3.9.2 with Java 11.

AngledLuffa commented 3 years ago

That is very true. Which JDK?

Although honestly there have been enough bugs like this which I fixed in the past and forgot, or which someone else fixed and I didn't know about, that I really am not planning on investigating an error in a version a couple years old at this point. I'd hate to spend a couple hours chasing it only to discover it was solved all along. If it comes up with 4.2.0, we'll definitely investigate.

kwalcock commented 3 years ago

That sounds frustrating. FWIW the Java 11 version that wasn't finding "annual" with 3.9.2 was Oracle jdk-11.0.10. I'm not really concerned about whether that or other things are found or not, but do need to get the same answer each time.

AngledLuffa commented 3 years ago

If you find any inconsistencies in 4.2.0, we will get to work on fixing those right away. Alternatively, as you find situations where 4.2.0 isn't suitable for your needs but 3.9.2, perhaps we can work to address that.