wysiib / linter-languagetool

Integration of Languagetool into the Atom text editor.
MIT License
17 stars 5 forks source link

Improve Markup Language Support #7

Open hesstobi opened 7 years ago

hesstobi commented 7 years ago

This linter is doing a great job. In case of writing a document with markup language like LaTeX it could be improved, because is shows errors on every latex command. As a quick and dirty solution a add following lines:

  editorContent = editorContent.replace /(\\\w+)((?:\{[^\}]*\})*)((?:\[[^\]]*\])*)((?:\{[^\}]*\})*)/g , (match, name, group1, group2, group3, index, input) ->
    if /\\(\w*section|\w*caption|text\w*|mbox)/.test(name)
      output = Array(name.length+1).join(" ") +
        group1.replace(/[\{\}]/g, " ") +
        Array(group2.length+1).join(" ") +
        group3.replace(/[\{\}]/g, " ")
    else
      output = Array(match.length+1).join " "
    return output

Which replacing the large part of the LaTeX markup with spaces. I than disabled the WHITESPACE_RULE A more general approach would be to ignore grammar scopes and pattern with an API like linter-spell is providing.

wysiib commented 7 years ago

We should go for a proper solution following the linter-spell one. Will look at it in the coming days.

wysiib commented 7 years ago

I am unsure whether the core plugin should include language-specific features. The same goes for issue #7. However, I am not sure about an API for connecting language-definitions as separate packages either. Any suggestions?

zoenglinghou commented 6 years ago

linter-spell-latex actually compiled excluded scopes for latex. Might be helpful.

wysiib commented 6 years ago

We thought about porting the solution done by the linter-spell package for quite some time. Currently, I am switching jobs and thus I do not have the time to implement things myself. But I will look into it, properly around end of Mai.

29antonioac commented 5 years ago

This linter is doing a great job. In case of writing a document with markup language like LaTeX it could be improved, because is shows errors on every latex command. As a quick and dirty solution a add following lines:

  editorContent = editorContent.replace /(\\\w+)((?:\{[^\}]*\})*)((?:\[[^\]]*\])*)((?:\{[^\}]*\})*)/g , (match, name, group1, group2, group3, index, input) ->
    if /\\(\w*section|\w*caption|text\w*|mbox)/.test(name)
      output = Array(name.length+1).join(" ") +
        group1.replace(/[\{\}]/g, " ") +
        Array(group2.length+1).join(" ") +
        group3.replace(/[\{\}]/g, " ")
    else
      output = Array(match.length+1).join " "
    return output

Which replacing the large part of the LaTeX markup with spaces. I than disabled the WHITESPACE_RULE A more general approach would be to ignore grammar scopes and pattern with an API like linter-spell is providing.

Hi! How could I use this workaround until a final solution is found?

Thanks!

hesstobi commented 5 years ago

You can use my branch, which add the basic support for markup languages using the linter-spell-api. I use this a lot for latex. There are still a lot of things missing.... https://github.com/hesstobi/linter-languagetool/tree/linter-spell-api

29antonioac commented 5 years ago

Thanks for your work! It works pretty well :).

Only one question: in my documents the command \gls{} for handling acronyms are not correctly filtered. Is this a problem related to your plugin or related to linter-spell?

Thanks for all!

73 commented 5 years ago

I would like to give this thumbs up. Support for LaTeX would be so awesome!

davidlday commented 5 years ago

I don't know if this helps or not, but the LanguageTool Server now has support for processing annotated text. Not sure when exactly they implemented it. You can see the data parameter of the API at SwaggerHub for an example. It takes a value like:

{"annotation":[
 {"text": "A "},
 {"markup": "<b>"},
 {"text": "test"},
 {"markup": "</b>"}
]}

Using the linter-spell approach, perhaps the different formats could be mapped to this annotated format? This would preserve offsets, I believe, and potentially be easier than trying to reduce to pure text.

wysiib commented 5 years ago

That sounds like another nice way to proceed. I agree, reducing to pure text and keeping offsets intact might be quite a hassle. However, I haven't found a list of "all" the annotations in say Latex. Could this be derived from the language tokens Atom creates anyway? @hesstobi since this is somewhat related to what you are doing: any input?

hesstobi commented 5 years ago

Yes I think this is a good way to go. But I currently do not have any time to work on that.

davidlday commented 5 years ago

I created a few stand-alone packages that convert markup into LanguageTool's annotated text that might help:

My quick search for a LaTeX parser turned up a couple of packages, but also several SO posts on how challenging it is to create a parser. If you all know of a good parser, I can see about creating another package to handle it. Or you're free to leverage the above to create one as well. :)

hesstobi commented 5 years ago

Nice work. But I think this is more useful outside of Atom. Because you will need a parser for every grammar. Atom includes the parsing of all major grammars. With the linter-spell-api it is possibility to choose which scopes should be checked by LanguageTool. This will enable LanguageTool to check comments in programming languages and so on.

davidlday commented 5 years ago

Thank you. I see where I misunderstood the parsing in Atom. Should have looked a little closer. :( Anyhow, I'll dig in a little deeper on the grammars & linter-spell as I have time and see if I can help out.

davidlday commented 5 years ago

I've been watching/commenting on an issue on atom-wordcount that feels like a similar problem. Basically trying to eliminate all non-natural language text from a document's word count. Getting tokenized lines seems to be possible using Atom's public API by:

editorGrammar = editor.getGrammar()
editorGrammar.tokenizeLines(editor.getText())

See the early snippet in the issue for an example of filtering out scopes using first-mate. This doesn't work for tree-sitter grammars but a similar approach should be possible

hesstobi commented 5 years ago

This is the API we need! I added this to #23. But we should also find a way for tree-sitter.

mbroedl commented 5 years ago

@hesstobi Have a look at this commit where I try to use the editor.tokensForScreenRow() API. Note that this API is undocumented and thus subject to change! (See also the discussion in atom-wordcount again.)