stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.71k stars 2.7k forks source link

Upgrade Java Servlet API 3.0.1 to Jakarta Servlet API #1269

Open sandeepnkulkarni opened 2 years ago

sandeepnkulkarni commented 2 years ago

Currently CoreNLP project uses old version of Java Servlet API 3.0.1. It was released way back in 2011. Security scanner is showing it as EOL version. Is it possible to upgrade to newer Jakarta Servlet API 4.0.4 at least? It supports Java 8, so it will not break any backward compatibility. There are guide available online about moving from javax to jakarta namespace.

AngledLuffa commented 2 years ago

What happens to any projects which happen to be using the old stuff still if we update? Although I suppose the opposite argument works for projects which have migrated and still want to use CoreNLP.

Also, how would updating this interact with things such as Tomcat?

Ultimately I think we'd be happy to take a PR with this change, but our threshold of doing it ourselves would be "super bored and nothing else going on"

eduarddrenth commented 1 year ago

Staying in sync with developments in the stuff (java in this case) you use imo is a must. Jakarta ee 10 and the new jdk versions have a lot to offer. I would upvote to systematically invest time in this. Now you are glueing users of your great library to old libraries.

AngledLuffa commented 1 year ago

You're not wrong. The issue is that there is one person qualified and available to do it, and unfortunately they do not fall in the "motivated" circle on that Venn diagram.

Maybe I ... er ... that person will try to find time before the end of the year

eduarddrenth commented 1 year ago

Maybe I can help, I have a lot of experience in java/xml/ee/jaxb etc. I do not have a lot of time though Is the xml part in the lib restricted to the configuration?

AngledLuffa commented 1 year ago

If only we could find a person with time :)

I know there's XML used internally as part of the SUTime representation, but I don't believe that's part of the java / jakarta issue

Maybe I can start digging into compatibility some today and see how much work it really is

AngledLuffa commented 1 year ago

I don't think any of these uses are particularly challenging, so hopefully they can all be replaced with the newer version:

src/edu/stanford/nlp/ie/ner/webapp/NERServlet.java
src/edu/stanford/nlp/naturalli/demo/CORSFilter.java
src/edu/stanford/nlp/naturalli/demo/OpenIEServlet.java
src/edu/stanford/nlp/patterns/SPIEDServlet.java
src/edu/stanford/nlp/pipeline/webapp/CoreNLPServlet.java
src/edu/stanford/nlp/time/suservlet/SUTimeServlet.java
AngledLuffa commented 1 year ago

Oh, even the namespaces are apparently the same in servlet 4.0.4? This is a lot easier than I thought

AngledLuffa commented 1 year ago

https://github.com/stanfordnlp/CoreNLP/pull/1388

(will have to test it, including bringing back the currently defunct nlp.stanford.edu:8080 demos)

AngledLuffa commented 1 year ago

We're running Tomcat 9, which I read uses servlet api 4. I believe that means upgrading shouldn't cause any problems with someone adding a feature based on the newer Jakarta version of the servlet which doesn't actually work in our Tomcat installation. (Not that that is a likely event anyway.) So, I merged this into dev, and it should be part of the next CoreNLP release, whenever that is.

eduarddrenth commented 1 year ago

Nice, this may even work with ee 10 jdk 17, can I try that when building from the dev branch?

AngledLuffa commented 1 year ago

Yes, but we would request that you first sign a consent form acknowledging that we take no responsibility if your computer explodes

eduarddrenth commented 1 year ago

:-)

Op vr 13 okt. 2023 17:59 schreef John Bauer @.***>:

Yes, but we would request that you first sign a consent form acknowledging that we take no responsibility if your computer explodes

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1269#issuecomment-1761750148, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACN2KKGPYE5MN4U3BJCFOE3X7FQP3ANCNFSM5V2TCLTQ . You are receiving this because you commented.Message ID: @.***>

eduarddrenth commented 1 year ago

Tested this return new Sentence(form).lemmas(), works nicely in payara 6 ee 10. Built corenlp from dev using pom-java-17.xml.

eduarddrenth commented 11 months ago

Now I include in pom.xml:

            <dependency>
                <groupId>edu.stanford.nlp</groupId>
                <artifactId>stanford-corenlp</artifactId>
                <version>4.5.5</version>
            </dependency>

But I have to git clone and mvn install -f pom-java-17.xml first.

Will this jakarta update make it to maven central somtime?

Regards, Eduard

AngledLuffa commented 11 months ago

eventually, yes, but probably early january

eduarddrenth commented 11 months ago

Good to know, thanks