pemistahl / lingua

The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
Apache License 2.0
706 stars 63 forks source link

Reduce resources to load language models #157

Closed pemistahl closed 1 month ago

pemistahl commented 1 year ago

Currently, the language models are parsed from json files and loaded into simple maps at runtime. Even though accessing the maps is pretty fast, they consume a significant amount of memory. The goal is to investigate whether there are more suitable data structures available that require less storage space in memory, something like NumPy for Python. Perhaps it is even possible to store those data structures in some kind of binary format on disk which can be loaded faster than the current json files.

Promising candidates could be:

pemistahl commented 1 month ago

Closed in favor of #214.