wysiib / linter-languagetool

Integration of Languagetool into the Atom text editor.
MIT License
17 stars 5 forks source link

How to use n-gram dataset ? #17

Closed jeanrjc closed 6 years ago

jeanrjc commented 6 years ago

Hi!

I downloaded the n-gram dataset, and put it in a directory like this: /path/to/dir/en/ which contains 3 folders: 1grams, 2grams, 3grams, and put the path /path/to/dir/ into the field "Path to the n-gram directory".

I manage to make it work with the standalone version (languagetool.jar) but not on Atom. I restarted atom but it did not work either.

I have Atom 1.20.0 and linter-languagetool v0.5.2

Thanks

wysiib commented 6 years ago

I though you just need to set the path to the directory. Which command line options did you use for the standalone version? I'll compare with the ones the Atom plugin is using.

jeanrjc commented 6 years ago

In the standalone version, there is a GUI where to set the path like in this package, so I put the same path, but it does not report anything in atom.

hesstobi commented 6 years ago

This setting is only applied if you restart the languagetool-server with the current version of atom the server stays alive even i atom is restarted. Thus kill the process and retry.

jeanrjc commented 6 years ago

I kill the process associated with languatool-server.jar from the terminal and restart Atom and it does not work either.

But actually I think the command you use is not correct. In the doc, it's written:

  • Command line: start with the --languagemodel option pointing to the ngram-index directory.
  • Server mode: start with the --config file option. This properties file needs to have a languageModel=… entry pointing to the ngram-index directory.

And the process you launch uses the --languageModel option, and should apparently use the --config file option. This is what I have:

$ ps x | grep languagetool
19770   ??  S      0:55.29 /usr/bin/java -cp /Users/foo/bar/bin/LanguageTool-3.8/languagetool-server.jar org.languagetool.server.HTTPServer --languageModel /Users/foo/bar/bin/LanguageTool-3.8/ngrams-en-20150817/
wysiib commented 6 years ago

Maybe we should replace the preference by a preference allowing to provide the location of a configuration file. I don't think we should write said file on demand.

jeanrjc commented 6 years ago

As long as it works, why not! You can just give a example of config file so people just have to copy past it and set its path in the preferences.

wysiib commented 6 years ago

I just published version 0.6.0 which replaces the n-gram path option by an option to set the path to a configuration file.

jeanrjc commented 6 years ago

OK, thanks, how should the configuration file look like ?

Could be nice to update the readme and the name of the field to something like N-gram config file path

jeanrjc commented 6 years ago

ah sorry, I didn't see there was a Path to a config file. Although, one could add N-gram to this field name and remove the old N-Gram Data Path field.

wysiib commented 6 years ago

I'm unsure whether the config file can only be used to provide nGram Data. Guess you can configure more parameters of your language tool server. Inside the file, you need to use languageModel=/path/to/ngram/data.

The old option has been removed in the source code. I guess you still see it, because it is stored in your Atom options.

jeanrjc commented 6 years ago

So I tried what you propose which is also what I thought it was, and what I found on internet too, but it doesn't work.

I have file ngram_atom.config containing:

# Path to N-gram directory
languageModel=/Users/foo/bar/bin/ngrams-en-20150817/

In the settings I have in the Path to a config file : /Users/foo/bar/bin/ngram_atom.config

And the command launched is:

$ ps x | grep language
50098   ??  S      1:00.28 /usr/bin/java -cp /Users/foo/bar/bin/LanguageTool-3.9-SNAPSHOT/languagetool-server.jar org.languagetool.server.HTTPServer --config /Users/foo/bar/bin/ngram_atom.config

I quit Atom and kill this process an restart Atom, but it didn't work either. Does it work on your side ?

wysiib commented 6 years ago

It is working for me. Observe, that you need to adhere to a specific directory structure, see the third item in the enumeration on http://wiki.languagetool.org/finding-errors-using-n-gram-data. I guess you do not have a directory named en inside of the ngrams-en-20150817 folder?

jeanrjc commented 6 years ago

Yes I have, my directory structure is like this:

/Users/foo/bar/bin/ngram_atom.config
/Users/foo/bar/bin/ngrams-en-20150817
                                     \_ en
                                          \_ 1grams
                                           \_ 2grams
                                           \_ 3grams
jeanrjc commented 6 years ago

How is your config file ?

wysiib commented 6 years ago

It's just

# Path to N-gram directory
languageModel=/Users/abc/xyz/ngrams

where ngrams is the folder in which the en folder resides.

jeanrjc commented 6 years ago

Hum, that's weird.

When you write "There last chance", it reports something like Statistics suggests that ... ?

Which version of languagetool do you have ? I tried with 3.8 and 3.9-SNAPSHOT (from 26 of Sept.) but none worked.

wysiib commented 6 years ago

Indeed, I get an Statistics suggest... error report. I'm using LanguageTool version 3.8, installed on Mac via Homebrew.

Edit: The only immediate difference I see between our configurations is the trailing slash....?

jeanrjc commented 6 years ago

Ah, I got it to work !!

I had the category TYPOS disabled, which appears to prevent ngram to work. Anyway, I'll deal with that. Although, it would nice to disable category-wise and rule-wise, so I could block MORFOLOGIK_RULE US for instance while not preventing ngram to work. [edit: I found a workaround I posted in a separate issue to make it more findable for other (See issue #18)]

Thanks for your replies !

hesstobi commented 6 years ago

So the reason was not the old commandline option. So the question is, if we can bring back the old option , which was working quite fine (although not in the wiki on languagtool.org). I think it is much easier for the user to provide only the path of the directory instead of creating the config file.

wysiib commented 6 years ago

We still need to check if it wasn't both the old command line option and the disabled rule! If the command line options works (despite LTs documentation), we can easily revert the commit and switch back.