pemistahl / lingua

The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
Apache License 2.0
706 stars 63 forks source link

Rewrite Lingua in Java #188

Closed pemistahl closed 1 month ago

pemistahl commented 1 year ago

As I most probably won't have much time in the future to maintain Lingua due to my family life, let's rewrite Lingua in Java as long as I have time. My hope is to attract more people from the Java community to use and improve this library in the future. Having a plain Java library should simplify long-term maintenance as well due to Java's backwards compatibility.

Kotlin is a great language but a second-class citizen on the JVM. Its use did not provide significant advantages compared to a plain Java implementation. I have doubts whether Kotlin will continue to have a bright future as both Java as a language and the JVM get significantly better, especially with the release of Java 21.

rogierslag commented 9 months ago

Hi @pemistahl!

I really like this library, and have started to work on migrating the core library itself (excluding reports etc for now) to Java. It's still a work in progress, but I'll file a PR once the test suite is fully passing.

If interested, you can find the in progress branch here

pemistahl commented 9 months ago

Hi Rogier, wow, that's awesome that you are deliberately doing this tedious amount of work. Looks very good so far. Thanks a lot. :)

Compared to my other implementations of Lingua, this one here has been stuck because I could not yet motivate myself to do the rewrite. If your work is successful, I will then add the new features from the other implementations to the Java rewrite. So please pay attention to do a 1-to-1 rewrite of the current Kotlin main branch only. Again, thank you very much. :)

sergeykad commented 1 month ago

Hi, is there a chance for future updates for this library, or is the project closed for good?

pemistahl commented 1 month ago

@sergeykad As you may have seen already, I've been maintaining four implementations of this library (Kotlin, Go, Rust, Python). Since the birth of my son, my spare time for updating all these implementations is very limited. My plan is to update only the Rust and Python implementations in the future. There will be new releases for both still in this year. For Go and the JVM, I will replace the native implementations with bindings to a Web Assembly (WASM) module created from the Rust implementation. Rust is the best language for this kind of library because it has the highest speed of execution and the lowest memory requirements.

pemistahl commented 1 month ago

@rogierslag I'm assuming that you won't continue to port my library to Java, right? There has not been any progress on your side since your initial post.

I'm closing this issue now in favor of #214.