add support for markup language

n3f4s commented 7 years ago

It would be nice to have support for mark-up language like latex or markdown. Using grammarous with those languages raise a lot of error due to the tag and automatic styling (for latex mostly because the latex compiler handle a lot of style issues).

Furthermore, having things like piece of code in the text can make the usage of grammarous less pleasant due to number of error raised. Supporting those mark-up languages would mean being able to disable spell checking for things like the verbatim environment in latex or code quote (``) in markdown.

I've asked LanguageTool if they could add the support of markup language, their answer is that it's up to the editor to handle the markup language parsing.

This look like an enhanced version of #10 (if I've understood the issue).

An idea could be to have a dictionary of command (by filetype). Those command would parse the file and return the raw text, without the markup and ignored text (like the content of the verbatim environment)

rhysd commented 7 years ago

I've asked LanguageTool if they could add the support of markup language, their answer is that it's up to the editor to handle the markup language parsing.

Yeah, it must be nice to support markdown or latex as you said. However, there is some problem to do that.

At first, it's comparably easy to strip markdown into plain text. There are already some tools to strip them and I can also use markdown parser library via scripting language interface like if_python.

The main problem is that the position (line, column, offset) of grammatical error is not correct after stripping markdown or latex. LanguageTool will return the result for stripped plain text, not for an original markdown or latex code. So we need to use sourcemap to maintain the relations of positions between markdown or latex code and coverted plaintext. As long as I googled, there is no markdown/latex conversion tool which supports sourcemap. So I need to make it but there is no resource to do that currently.

Thank you for your suggestion but current my opinion is that it's hard to support.

n3f4s commented 7 years ago

I understand that it's hard to support and that's why I initially made a request in LanguageTool. They can do the stripping/conversion and return the right position for the errors.

A solution might be to simply replace the "markup" by space/newlines and disable the formatting (spaces, newlines, ...) error of LanguageTool. Most of the markup language deal with formatting themselves. It would be harder for LaTeX (and other languages with macro) since some macro can expand to several word. But removing the formatting markup (*, _, \textit, ...) and the code block would at least remove a lot of false errors.

Then again, I understand it's not straightforward.

copyme commented 7 years ago

I would love to use your plugin with LaTeX documents. TexStudio supports LanguageTool (not in a perfect way but still it is useful). Maybe reading their code would give you some clues how to implement something similar.

DerWeh commented 6 years ago

You already offer the possibility to ignore everything but comments and I think I have read that you identify comments by the highlight group. If that is the case would it be possible to do the reverse way and specify highlight groups we want to ignore? For Latex, I use e.g. use vimtex which defines (think a lot is already provided by vim by default) for everything I would like to ignore (commands, math mode!!, curly brackets, comments, spacing ('\@', ...)). This would be the most convenient way as we can pass the difficult task of getting the multitude of Latex specifiers to dedicated plug-ins and just fetch (maybe also per regular expression) the highlight groups we want to ignore from them.

languitar commented 6 years ago

For LaTeX, simply ignoring all commands (from highlight groups) could become a real issue. For instance, \emph{this word} is important for the sentence while with \footnote{some footnote} the contained text should be parsed, but not at that position. Finally, with commands like \cite{Foo}, the contained arguments should probably ignored at all. So simply using highlight groups will most likely still produce some ugly errors.

I hacked a special-purpose solution for my own needs with LaTeX by adding a preprocessor that replaces most of the commands I frequently use with the plain-text representations + enough spaces to avoid shifting the result locations.

I have integrated this into the plugin by locally setting

let g:grammarous#languagetool_cmd = getcwd() . '/detex-languagetool'

where detex-languagetools is a simple wrapper like:

#!/usr/bin/env python3

import os
import subprocess
import sys

dir_path = os.path.dirname(os.path.realpath(__file__))

subprocess.call('cat ' + sys.argv[-1] + ' | '
                + os.path.join(dir_path, 'detex.py') + ' | '
                + 'languagetool ' + ' '.join(sys.argv[1:-1]),
                shell=True)

and detex.py does the actual stripping of LaTeX, which I hand-crafted with some regular expressions for my specific needs.

pinpox commented 6 years ago

Hello, is this feature still planned eventually/already in progress? I understand it's not very easy to implement, but LaTex files would be the main reason I would like to use this.

@languitar's commet looks like a viable workaround. Could this not be integrated into the plugin?

pinpox commented 6 years ago

Sorry for double-posting, but here is a possible workaround. If I understand correctly the problem with stripping latex commands is that it changes the length of the code. Also, you want to treat the commands differently:

In commands like \section{My section name} you want "My section name" to be checked.
In commands like \cite{MyCitation} you do not want "MyCitation" to be checked.

I created this small ruby script:


def get_commands line

    fullreplace = ["cite", "label"]
    partreplace = ["section", "subsection"]
    # puts line

    fullreplace.each do |fr|
        len = (line.gsub(/\\#{fr}\{([^}]*)}.*/,"#{" " *fr.length} \\1 ")).length
        line = line.gsub(/\\#{fr}\{[^}]*}/," " * len)
    end

    partreplace.each do |pr|
        line = line.gsub(/\\#{pr}\{([^}]*)}/,"#{" " *pr.length}  \\1 ")
    end
    puts line
end

ARGF.each_line do |line|
    get_commands(line)
end

You can pipe tex into it and it outputs the latex commands replaced with the correct amount of whitespace.

I now it's not pretty, just a proof-of-concept, but couldn't be this used if we specify all latex commands in the two arrays?

languitar commented 6 years ago

Just for the reference, this is my custom script: https://gist.github.com/languitar/2037fccd8520586639aa9f1227bbf8e6 It handles a few more cases.

real-or-random commented 6 years ago

This is related: https://github.com/dpelle/vim-LanguageTool/pull/4 My texlive installation includes the "detex" tool.

This is interesting too: https://github.com/pkubowicz/opendetex

copyme commented 6 years ago

@real-or-random unfortunately, opendetex/detex does not work well with real life LaTeX documents. I would say that all the tools to convert (la)tex to text that I have checked offer very basic functionalities. Maybe pandoc at some point will be good enough but so far it has some problems too.

MarcelRobitaille commented 5 years ago

Since we don't (yet) have tool to strip markdown with sourcemaps, would it be possible to open the plain text version in a split and leave it up to the user to find the error in the source? This wouldn't be a huge deal for markdown and would really help.

krishnakumarg1984 commented 5 years ago

@rhysd there is finally a (platform-independent) solution for the problem that blocks you from implementing a markup parser - textidote!

Your Problem

".... option is to remove all this markup, leaving only the "clear" text; however, when a grammar tool points to a problem at a specific line in this clear text, it becomes hard to retrace that location in the original LaTeX file.

Solution

TeXtidote solves this problem; it can read your original LaTeX file and perform various sanity checks on it: for example, making sure that every figure is referenced in the text, enforcing the correct capitalization of titles, etc. In addition, TeXtidote can remove markup from the file and send it to the Language Tool library, which performs a verification of both spelling and grammar in a dozen languages. What is unique to TeXtidote is that it keeps track of the relative position of words between the original and the "clean" text. This means that it can translate the messages from Language Tool back to their proper location directly in your source file.

Can you try to port their logic to vim-grammorous?

copyme commented 5 years ago

@krishnakumarg1984 thanks for the info about texttidote, it looks pretty interesting.

matze-dd commented 4 years ago

YaLafi does filter LaTeX text, too. This project is still in an early stage, however.

oblitum commented 3 years ago

Would be great if TeXtidote support was added, it even supports json results for easy parsing. Great wrapper for LanguageTool, would make this plugin work on LaTeX and Markdown.

kevincox commented 2 years ago

I wonder if this can be fixed in the plugin instead of LanguageTool. With more and more highlighters supporting fenced languages Vim now knows what part of my markdown doc are markdown and what are code blocks. Even if the markdown inline syntax causes a few errors it would be great if the code blocks could be ignored since those cause a huge amount of problems and are annoying to skip over every time.

pinpox commented 2 years ago

Vim now knows what part of my markdown doc are markdown and what are code blocks

If I'm not mistaken, this should be possible with tree-sitter.

DerWeh commented 2 years ago

Recently I switched to ltex which is based on LanguageTool and does a reasonable job supporting LaTeX and other markup languages. It can simply be used as a language server in vim/neovim, see https://valentjn.github.io/ltex/installation-usage.html.

It tends to use excessive amounts of memory and so far I wasn't able to add words to the dictionary, but else it seems fine. Maybe it's an option for vim-grammarous to replace bare language tool by ltex.

With tree-sitter, I am rather doubtful. I am still waiting for a simple dictionary-type spell checking that works as good as vim's default. There is spellsitter, but it didn't convince me yet to switch.

jdhao commented 1 year ago

@DerWeh Are you using both ltex and vim-grammarous? What is your take on their pros and cons? I also find vale. Haven't used vale and not sure how is it compared langtool.

rhysd / vim-grammarous

add support for markup language #40

Your Problem

Solution