Linter for grammar - Githubissues

PierreMarchand20 commented 5 years ago

Feature Request

Is your feature request related to a problem? Please describe.

Since the issue has been closed in https://github.com/James-Yu/LaTeX-Workshop/issues/881#issuecomment-533659579, I open an issue here in case you are interested in adding more linters.

Describe the solution you'd like

As I said in the previous link, we could add TeXtidote which uses LanguageTools in latex documents. It just strips out latex from your document, calls LanguageTools which then checks spell and grammar for a lot of languages. TeXtidote then computes where are the locations of the errors/warning in your document.

There was an issue (solved now) to have an easy-to-parse output (see here), so I guess it would not be too difficult to use.

It main be expensive to run it at every save for example, but having a command to call it would be nice.

tecosaur commented 5 years ago

Could be interesting. Probably more of a medium-term than short-term thing though.

tecosaur commented 5 years ago

@PierreMarchand20 something else I'd be interested in in part of speech syntax highlighting. Something like python's textblob may be useful. Another tool I'd be interested in adding support for would be vale. Would you be interested in working together to add more text/writing tools to this extension?

PierreMarchand20 commented 5 years ago

I have never worked on a vscode extension and typescript, but that could be interesting.

I looked at Vale but it seems that there are not kind to add support to LaTeX (https://github.com/errata-ai/vale/issues/54).

TextBlob seems interesting, but it only supports three languages (English, and via extensions, French and German) and it seems more focused on sentences analysis.

For the record, LanguageTool works for several languages (https://languagetool.org/languages), and it has also an extension in vscode, but it does not support latex.

In any case, I think we can always redo what Textidote does for calling LanguageTool on latex documents, which is replacing/removing every latex environments using regex and then call any linter we want.

tecosaur commented 5 years ago

If you know javascript and are willing to look at docs you'll be good :) if not: no worries. Long term, I think the best approach would be to implement the latex to plaintext, then translate positions back so that our choice of tools isn't limited. I wonder if we can just ~~steal~~ find inspiration from TeXtidote's implementation.

Then we can leave the installation of individual tools to the end-user, and just supply 'presets' i.e. the TeXtidote thing + output parsing to warnings etc. Perhaps have a setting like this

"latex-utils.externalToolIntegrations.enabled": ["vale", "TextBlob", "LanguageTool"]

What are your thoughts?

PierreMarchand20 commented 5 years ago

I do not know javascript, but I have already done some small contributions to one or two extensions for vscode and it did not seem too complicated ^^ I will try to help.

Yes, I also think it is the best approach! This way we can use any linter with the plaintext output a priori. The only thing we need to do for each linter is to parse their output, which should be not too difficult if it is well correctly done.

tecosaur commented 5 years ago

Sounds great! Here's my current plan:

Make a new branch for this
Lay down the boilerplate
Invite you to collaborate
We go from there

PierreMarchand20 commented 5 years ago

Perfect !

tecosaur commented 5 years ago

Done! See https://github.com/tecosaur/LaTeX-Utilities/tree/dev-linting

I added vale as an example of how I think it could work.

For input cleaning I think we could start by looking at https://github.com/sylvainhalle/textidote/blob/master/Source/Core/src/ca/uqac/lif/textidote/cleaning/latex/LatexCleaner.java

Addendum: I'm a bit busy over the next few weeks, but I'll try to put in time where I can

PierreMarchand20 commented 5 years ago

Perfect ! I am also quite busy starting a new job but I will also try to take a look ^^ I think it could be really interesting for all the LaTeX writers around here ahah

tecosaur commented 5 years ago

I did a bit more with vale to see what sort of things we can hope for, and it's looking good! I think the framework is generic enough that we can hope to do something similar with a bunch of tools, once we get the LaTeX to plaintext thing working (and then also a bunch of other issues).

PierreMarchand20 commented 5 years ago

Ok, I downloaded vale and cloned the repo. I started to play with it a little bit and I have some questions (remember, I have never worked on an vscode extension or with typescript).

What is the use of the line

vscode.commands.registerCommand('latex-utilities.code-action', (d, r, c, m) =>
        extension.codeActions.runCodeAction(d, r, c, m)
    )

where is it prescribed that the linter should be called when the file is saved ? I noticed that it is used only when I save my document.

PierreMarchand20 commented 5 years ago

I have not understood what is done in textidote, but here’s how I think we can implement that:

we need an array of pairs (line number, number of lines) that keeps track of deleted lines. For example, if we have an align environnement of three lines starting at line 6, the pair would be (6,3).
we also need a map whose key is the line number and the value is an array of pairs which does the same thing as the previous array, but for characters.

With these two objects, I think we can do the change of coordinates for the diagnostics given by the linter.

I will try to implement that next week.

PierreMarchand20 commented 5 years ago

But I must say that vale is almost working straightforwardly for my latex documents...

I am not a fan of the output “writing-good” though ^^ I have to try other styles

tecosaur commented 5 years ago

Great to see you've started looking at this! I look forward to seeing more of your work.

Answering your questions

Purpose of this line

vscode.commands.registerCommand('latex-utilities.code-action', (d, r, c, m) =>
        extension.codeActions.runCodeAction(d, r, c, m)
    )

Ot simply allows actions to be registered which call (from codeAction.ts)

runCodeAction(document: vscode.TextDocument, range: vscode.Range, source: string, message: string)

How run-on-save is called

In main.ts we have this block

vscode.workspace.onDidSaveTextDocument((e: vscode.TextDocument) => {
    if (e.uri.fsPath === extension.completionWatcher.snippetFile.user) {
        extension.completionWatcher.loadSnippets(true)
    } else {
        extension.wordCounter.setStatus()
        if (utils.hasTexId(e.languageId)) {
            extension.diagnoser.lintDocument(e) // <--- this line here
        }
    }
}),

Mimicing texidote

This seems like the sort of thing which benefits a lot by chosing a good data structure for it. I'll have a think about it and let you know if I have any ideas different to yours.

My own question

How do you find the interface IDiagnosticSource? (as implemented in vale.ts). If you've got any feedback it would be best to make use of it now rather than later :)

tecosaur commented 5 years ago

( btw, I don't think we should need a vale path, because vale <arguments> should work, but for some reason it didn't work on my machine )

PierreMarchand20 commented 5 years ago

Thank you for your answers! Are there any reasons why you need to check e.uri.fsPath === extension.completionWatcher.snippetFile.user for calling the linter ?

I will wait for your feedback on the data structure before starting to implement it then ^^

IDiagnosticSource looks nice, the only thing is that I would add somewhere is a way for the user to give its own args to each linter.

PierreMarchand20 commented 5 years ago

A remark I would make though is that you are always aiming to linters that we have via command line. But for example, Vale or LanguageTool both have a vscode extension, so may be it would be possible to use them ? It is just an idea.

PierreMarchand20 commented 5 years ago

I noticed that Vale has a lot of style to help with styles, but I am not sure it really helps with grammar. LanguageTool also has a java executable standalone here and it is available in a lot of languages, I suggest we add this one after Vale ^^

PierreMarchand20 commented 5 years ago

( btw, I don't think we should need a vale path, because vale <arguments> should work, but for some reason it didn't work on my machine )

I will remove it and let vale instead then. I noticed I had to reload vscode the first time to update my PATH variable that must contain the path to 'vale'. Otherwise, try to output 'process.env'

PierreMarchand20 commented 5 years ago

I started to play around with LanguageTool, the JSON format is defined here, I will try to define the associated linter.

tecosaur commented 5 years ago

Let's see

What's `e.uri.fsPath === extension.completionWatcher.snippetFile.user` doing?

That's just for extension.completionWatcher.loadSnippets(true) (live snippet stuff)

More command line args

Good idea, that's an easy tweak

Vale or LanguageTool both have a vscode extension

I think they just call command line vale/languagetool within the extension
Those are two extensions I've tried/used, the issue I've had is with them + LaTeX. e.g. \begin{enumerate} "enumerate is too wordy", or insude a table word & word & word "word is repeated" etc.

Data structure

Sorry I've forgotten to actually go over that, I'll try to remember to do so tommorow and get back to you :)

LanguageTool after Vale

Sure :smiley: I just did Vale because I like to build around an example, and I'd done a little bit with vale before (e.g. knew it output json which is nice)
Glad to hear you're starting work on that

PierreMarchand20 commented 5 years ago

Ok for e.uri.fsPath === extension.completionWatcher.snippetFile.user, what does it have to do with linters ? Could we put the lines

if (utils.hasTexId(e.languageId)) {
     extension.diagnoser.lintDocument(e) // <--- this line here
}

outside of this check ?

Could you show me where you found the documentation for the vscode diagnostic ? So that I can do the link between the vscode diagnostics and those of LanguageTool.

PierreMarchand20 commented 5 years ago

I have a first version of LanguageTool.ts, could you look at it ? The only issue was that they do not use position as (line,character), but only give an offset in the whole document. And of course, there are a lot of false positive due to latex :-)

LanguageTool offers a lot of outputs, I just used a small part for now... And I am yet to do the code-actions.

PierreMarchand20 commented 5 years ago

I just added the code actions, I learned a lot of typescript in a few hours xD

PierreMarchand20 commented 5 years ago

Last thing and then I stop spamming. The issue is that LanguageTool gives an offset and a length, considering the document as a big string. It is a recurring problem with VSCode so there are functions (positionAt) associated with vscode.TextDocument to go from this representation to using a pair of (line,offset). Problem: we need to call LanguageTool on vscode.TextDocument after we remove all the latex, so I am not sure we will still be able to use positionAt.

tecosaur commented 5 years ago

Great to see the work you've done, thanks for helping out with this!

Tweak if statement in main.ts

Done in a0008f51033793bcfe768cc661f15ab463f6641f
Documentation for vscode diagnostic
here: https://code.visualstudio.com/api/references/vscode-api#Diagnostic
Take extra arguments
Done in 258c0f4dfa38b7db5f1ef28207f0ebd33e1bc446
Ofset + length representation
Shouldn't be too hard to work around
Also, I was thinking the best way may be to create a temporary file, e.g. document.tex → .document.txt
Testing languagetool
Tried it, got NodeError: stdout maxBuffer length exceeded first
Tried it with a short document, worked!
May want to do a bit of tinkering with the error serverity
New idea
I noticed that languagetool may take some time to run, with this in mind I think we might benefit from
- Making lintDocument async (done in 14859161be12f7d3cd1b9e2e4882a3d702500a3b)
- Adding a currentlyRunning bool attribute so that multiple instances of a long(ish) running command aren't spawned

tecosaur commented 5 years ago

Furthermore to the NodeError: stdout maxBuffer length exceeded error, I think the way to solve that is by switching from using execFile to spawn.

PierreMarchand20 commented 5 years ago

Great!

I think I also got NodeError: stdout maxBuffer length exceeded, I guess this is because there are too many false positive with the latex environments. Once we will have solved the issue of removing all the latex, it should not appear again.

What async does ? It seems that it should be used with await usually. I agree the zcurrentlyRunning` idea!

tecosaur commented 5 years ago

I guess this is because there are too many false positive with the latex environments

This is just an issue with long outputs. Using spawn we should be able to handle much larger outputs. I'll take a look at that.

What async does?

Afaik this should just help with it being non-blocking

I agree the currentlyRunning idea!

Shouldn't be too hard to implement :smiley:

tecosaur commented 5 years ago

Switched to spawn and implemented currentlyRunning (as currentProcess) in 75143f05fd18c35a666f88a949529deb6b68a3f4

tecosaur commented 5 years ago

Minor update: I tested my big-ish tex file and it all worked. Also had a think about the data structure and arrays of removed content (one for lines, on for characters) seems like a relatively decent approach to me.

I'm getting a really good feeling about this!

tecosaur commented 5 years ago

Hmmm. Another thought, since these tools run on the whole file (and can take a few seconds), it could be nice to make use of a stateful model. I.e. change a few words in a paragraph, just run tools on modified paragraph instead of the full document. Perhaps we could just do this in-between saves with a timeout :thinking:

PierreMarchand20 commented 5 years ago

Perfect!

If we can create another file to parse with the linter, it should be not too difficult. But I was afraid it would be too expensive. And where could we create this file ?

I will try what I suggested then ^^

PierreMarchand20 commented 5 years ago

Hmmm. Another thought, since these tools run on the whole file (and can take a few seconds), it could be nice to make use of a stateful model. I.e. change a few words in a paragraph, just run tools on modified paragraph instead of the full document. Perhaps we could just do this in-between saves with a timeout 🤔

or we can define a command so that the user decides when he wants to call the linters?

PierreMarchand20 commented 5 years ago

It would be really interesting to run it only on modified parts, as you said, it takes time. But may be it would be difficult to do it in a first time directly.

tecosaur commented 5 years ago

The best bet is probably just to create a file in the tmp folder. See the use of tmpdir in tikzpreview.ts.

Too expensive

I think the contrary. We can easily run multiple tools on the same file, and maybe even make incremental changes

define a command so that the user decides when he wants to call the linters It would be really interesting to run it only on modified parts, as you said, it takes time. But maybe it would be difficult to do it in the first time directly.

I think the first time we just run on the whole file, there's not really much else you can do. Then as part of the linter-config we could have a setting like

"latex-utils.diagnoser.tools": {
  "languagetool": {
    "enabled": true,
    "args": [],
    "run": "onSave" | "onSaveAndIncremental" | "incrementally"
  }
}

or similar. Basicly I see there being three main modes:

Lint entire file on save
Lint entire file on save, and incrementally with timeout
After initial whole-file run, run incrementally on save & timeout

PierreMarchand20 commented 5 years ago

Ok, I will look at tmpdir then.

What do you mean by incremental changes ? Anyway, sounds great to me

tecosaur commented 5 years ago

By incremental changes I mean

- this is is a test
+ this is a test

then just rerunning the linters on the changed line.

PierreMarchand20 commented 5 years ago

Ok I see.

What I meant is that may be we should first implement the linter on the whole file, removing the latex and so on. And then, when it will work, we can try to implement the incremental feature, what do you think ?

tecosaur commented 5 years ago

That's what I was thinking :smiley: I just think it's good to keep these things in mind when implementing to we don't unintentionally make our lives harder later on.

On that note, with the LaTeX to plaintext conversion, I think I'd be good to use a very modular/versatile approach for the parsing. I'd like to do some things such as

Ignore arguments of specific commands — e.g. \*ref{...}
Ignore some environments, and their optional arguments — e.g. tikzpicture, minted

Given the existence of user-definable commands, I think it would be best to operate on a blacklist model. So I'd imagine the config would be split into two sections, commands and environments. Then for each item have a few properties that set ignoring/passthrough of some/all mandatory/optional arguments This would ideally be able to mix in / be replaced with a user config setting, any single JSON object should work.

From there the best thing to do would probably be just to compile a giant regex expression and run that against the document.

Those are my ideas, what do you think?

PierreMarchand20 commented 5 years ago

That's what I was thinking 😃 I just think it's good to keep these things in mind when implementing to we don't unintentionally make our lives harder later on.

I agree perfect!

On that note, with the LaTeX to plaintext conversion, I think I'd be good to use a very modular/versatile approach for the parsing. I'd like to do some things such as

Ignore arguments of specific commands — e.g. *ref{...} Ignore some environments, and their optional arguments — e.g. tikzpicture, minted Given the existence of user-definable commands, I think it would be best to operate on a blacklist model. So I'd imagine the config would be split into two sections, commands and environments. Then for each item have a few properties that set ignoring/passthrough of some/all mandatory/optional arguments This would ideally be able to mix in / be replaced with a user config setting, any single JSON object should work.

From there the best thing to do would probably be just to compile a giant regex expression and run that against the document.

I am not sure I understood everything.

Concerning commands, why do you want to only ignore arguments?
What are the properties you want for each item ?

In my mind I was just thinking of a list of commands and a list of environments as you say in a JSON files (so that a user can add his own). And then, for each command 'foo' we remove \foo[...]{...} and for each environment 'bar', we remove \begin{bar}...\end{bar}.

PierreMarchand20 commented 5 years ago

Then, there are some details of course (document environment, itemize/enumerate environment that we can keep, $...$ , etc)

tecosaur commented 5 years ago

I feel as though I'm overthinking this I think (for commands) an array of regex is probably best.

Unfortunately due to nesting parenthesis (e.g. \graphicspath{{...}}) we can't use just regex :(

To clarify on my terminology, I think of latex macros like this:

\command[optional1][optional2]{mandtory1}{mandatory2}

tecosaur commented 5 years ago

why do you want to only ignore arguments?

By that I mean that we want to ignore the \command bit always, but sometimes we want to pass through an argument (e.g. \emph{...}), sometimes we want to ignore it (e.g. \tikzset{...}), then a similar thing with environments of course.

Not sure what the cleanest way of being able to represent differences is what we want done would be.

PierreMarchand20 commented 5 years ago

That is when we can look at what he has done in textidote ^^

PierreMarchand20 commented 5 years ago

Then, we can define a list of commands where we delete everything, a list of command where we keep the argument. And it is a similar for environments, but it is a little bit more difficult to keep the argument of an environment.

tecosaur commented 5 years ago

Perhaps then we just take the approach where, this may be a little heavy-handed, but it will be good for 99% of the document. Remove all optional arguments of environments, and then split commands into "keep arguments" and "remove arguments". That sounds reasonable to me.

tecosaur commented 5 years ago

@PierreMarchand20 how's it going?

PierreMarchand20 commented 5 years ago

I did not have the time to do it so far, may be this weekend.

PierreMarchand20 commented 5 years ago

Ok I looked a little bit, I think actually this easiest would be to get the text of the file in a string and call regex functions. I do not see the advantage of copy paste the text in an auxiliary file. What do you think ? We can still do every operations on a string and then create an auxiliary file to store the cleaned string if it is really necessary.

With this in mind I wanted to use the function matchall (https://javascript.info/regexp-methods) which seems really perfect for us, but I cannot manage to use it. I added the necessary lib as mentioned here but still, when I compile, it does find the function. Any ideas ?

tecosaur / LaTeX-Utilities

Linter for grammar #39

Feature Request

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Answering your questions

Mimicing texidote

My own question

What's `e.uri.fsPath === extension.completionWatcher.snippetFile.user` doing?

More command line args

Vale or LanguageTool both have a vscode extension

Data structure

LanguageTool after Vale

Tweak if statement in main.ts

Documentation for vscode diagnostic

Take extra arguments

Ofset + length representation

Testing languagetool

New idea

tecosaur / LaTeX-Utilities

Linter for grammar #39

Feature Request

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Answering your questions

Mimicing texidote

My own question

What's e.uri.fsPath === extension.completionWatcher.snippetFile.user doing?

More command line args

Vale or LanguageTool both have a vscode extension

Data structure

LanguageTool after Vale

Tweak if statement in main.ts

Documentation for vscode diagnostic

Take extra arguments

Ofset + length representation

Testing languagetool

New idea

What's `e.uri.fsPath === extension.completionWatcher.snippetFile.user` doing?