sylvainhalle / textidote

Spelling, grammar and style checking on LaTeX documents
https://sylvainhalle.github.io/textidote
GNU General Public License v3.0
926 stars 68 forks source link

n-grams analysis (using `--languagemodel`) gives `java.util.ServiceConfigurationError` #59

Open sim590 opened 5 years ago

sim590 commented 5 years ago

When running the following command (file read from stdin):

textidote --languagemodel /path/containing/fr/directory --html --dict .ltignore --check fr

I end up with the following error:

Using N-grams from /home/simon/Téléchargements
TeXtidote v0.7 - A linter for LaTeX documents and others
(C) 2018-2019 Sylvain Hallé - All rights reserved

Exception in thread "main" java.util.ServiceConfigurationError: Cannot instantiate SPI class: org.apache.lucene.codecs.lucene50.Lucene50Codec
    at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:82)
    at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:51)
    at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:38)
    at org.apache.lucene.codecs.Codec$Holder.<clinit>(Codec.java:47)
    at org.apache.lucene.codecs.Codec.forName(Codec.java:113)
    at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:469)
    at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:361)
    at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:53)
    at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731)
    at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
    at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
    at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel$LuceneSearcher.<init>(LuceneSingleIndexLanguageModel.java:242)
    at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel$LuceneSearcher.<init>(LuceneSingleIndexLanguageModel.java:230)
    at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.getCachedLuceneSearcher(LuceneSingleIndexLanguageModel.java:183)
    at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.addIndex(LuceneSingleIndexLanguageModel.java:119)
    at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.<init>(LuceneSingleIndexLanguageModel.java:93)
    at org.languagetool.languagemodel.LuceneLanguageModel.<init>(LuceneLanguageModel.java:65)
    at org.languagetool.language.French.getLanguageModel(French.java:132)
    at org.languagetool.JLanguageTool.activateLanguageModelRules(JLanguageTool.java:341)
    at ca.uqac.lif.textidote.rules.CheckLanguage.activateLanguageModelRules(CheckLanguage.java:241)
    at ca.uqac.lif.textidote.Main.mainLoop(Main.java:546)
    at ca.uqac.lif.textidote.Main.mainLoop(Main.java:124)
    at ca.uqac.lif.textidote.Main.main(Main.java:110)
Caused by: java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does not exist.  You need to add the corresponding JAR file supporting this SPI to your classpath.  The current classpath supports the following names: [Lucene40, Lucene41]
    at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:114)
    at org.apache.lucene.codecs.PostingsFormat.forName(PostingsFormat.java:112)
    at org.apache.lucene.codecs.lucene50.Lucene50Codec.<init>(Lucene50Codec.java:155)
    at org.apache.lucene.codecs.lucene50.Lucene50Codec.<init>(Lucene50Codec.java:75)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at java.base/java.lang.Class.newInstance(Class.java:584)
    at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:72)
    ... 23 more

The n-grams files were downloaded from the following URL:

https://languagetool.org/download/ngram-data/

as instructed by the LanguageTool documentation page.

sylvainhalle commented 5 years ago

Something needs to be edited inside the JAR file; solution here: https://anwaarlabs.wordpress.com/2017/02/25/lucene-an-spi-class-of-type-org-apache-lucene-codecs-codec-with-name-does-not-exist/

I can patch the existing release, but I'll have to think of a way to automate this for future releases.

sylvainhalle commented 5 years ago

Release v0.7.1 should fix the problem. Feel free to reopen if the problem persists.

sim590 commented 5 years ago

I just tested 0.7.1. It worked fine! Thanks for the quick reaction!

inventionate commented 2 years ago

Thank you for the great program. Unfortunately I get exactly this error when I want to use the latest version (0.8.3) with current n-gram data. Is there a solution for this?

sylvainhalle commented 2 years ago

The new LanguageTool jar seems to have the exact same issue as the previous one. I'll reopen and try to fix it again.

Jollywatt commented 2 years ago

I’m getting this exact error, but can’t fix it even after following the instructions at https://anwaarlabs.wordpress.com/2017/02/25/lucene-an-spi-class-of-type-org-apache-lucene-codecs-codec-with-name-does-not-exist/.

Is there a workaround simple enough for non-Java users?

bong0 commented 1 year ago

I "patched" the 0.8.3 version manually and it works for me. Use at your own risk. I edited it those three files in META_INF using vim.

https://transfer.sh/64Oy0T/textidote_patched.jar

sylvainhalle commented 1 year ago

@bong0 Thanks for your contribution. I would like to integrate your changes in the pipeline that creates the LanguageTool fat JAR. Could you please tell me which files you modified and what changes you made to them?

bong0 commented 1 year ago

Sure, hope that helps: so I added the lines listed in the diff. Let me know if there's something unclear :) [changed] META-INF/services/org.apache.lucene.codecs.PostingsFormat

❯ diff t1/**/META-INF/services/org.apache.lucene.codecs.PostingsFormat  t2/**/META-INF/services/org.apache.lucene.codecs.PostingsFormat
17a18
> org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat

[changed] META-INF/services/org.apache.lucene.codecs.DocValuesFormat

❯ diff t1/**/META-INF/services/org.apache.lucene.codecs.DocValuesFormat  t2/**/META-INF/services/org.apache.lucene.codecs.DocValuesFormat
20a21
> org.apache.lucene.codecs.lucene54.Lucene54DocValuesFormat

[changed] META-INF/services/org.apache.lucene.codecs.Codec

❯ diff t1/**/META-INF/services/org.apache.lucene.codecs.Codec  t2/**/META-INF/services/org.apache.lucene.codecs.Codec           
24a25
> org.apache.lucene.codecs.lucene54.Lucene54Codec