Closed PierreMarchand20 closed 2 years ago
Could be interesting. Probably more of a medium-term than short-term thing though.
@PierreMarchand20 something else I'd be interested in in part of speech syntax highlighting. Something like python's textblob may be useful. Another tool I'd be interested in adding support for would be vale. Would you be interested in working together to add more text/writing tools to this extension?
I have never worked on a vscode extension and typescript, but that could be interesting.
I looked at Vale but it seems that there are not kind to add support to LaTeX (https://github.com/errata-ai/vale/issues/54).
TextBlob seems interesting, but it only supports three languages (English, and via extensions, French and German) and it seems more focused on sentences analysis.
For the record, LanguageTool works for several languages (https://languagetool.org/languages), and it has also an extension in vscode, but it does not support latex.
In any case, I think we can always redo what Textidote does for calling LanguageTool on latex documents, which is replacing/removing every latex environments using regex and then call any linter we want.
If you know javascript and are willing to look at docs you'll be good :) if not: no worries.
Long term, I think the best approach would be to implement the latex to plaintext, then translate positions back so that our choice of tools isn't limited. I wonder if we can just steal find inspiration from TeXtidote's implementation.
Then we can leave the installation of individual tools to the end-user, and just supply 'presets' i.e. the TeXtidote thing + output parsing to warnings etc. Perhaps have a setting like this
"latex-utils.externalToolIntegrations.enabled": ["vale", "TextBlob", "LanguageTool"]
What are your thoughts?
I do not know javascript, but I have already done some small contributions to one or two extensions for vscode and it did not seem too complicated ^^ I will try to help.
Yes, I also think it is the best approach! This way we can use any linter with the plaintext output a priori. The only thing we need to do for each linter is to parse their output, which should be not too difficult if it is well correctly done.
Sounds great! Here's my current plan:
Perfect !
Done! See https://github.com/tecosaur/LaTeX-Utilities/tree/dev-linting
I added vale as an example of how I think it could work.
For input cleaning I think we could start by looking at https://github.com/sylvainhalle/textidote/blob/master/Source/Core/src/ca/uqac/lif/textidote/cleaning/latex/LatexCleaner.java
Addendum: I'm a bit busy over the next few weeks, but I'll try to put in time where I can
Perfect ! I am also quite busy starting a new job but I will also try to take a look ^^ I think it could be really interesting for all the LaTeX writers around here ahah
I did a bit more with vale to see what sort of things we can hope for, and it's looking good! I think the framework is generic enough that we can hope to do something similar with a bunch of tools, once we get the LaTeX to plaintext thing working (and then also a bunch of other issues).
Ok, I downloaded vale and cloned the repo. I started to play with it a little bit and I have some questions (remember, I have never worked on an vscode extension or with typescript).
vscode.commands.registerCommand('latex-utilities.code-action', (d, r, c, m) =>
extension.codeActions.runCodeAction(d, r, c, m)
)
I have not understood what is done in textidote, but here’s how I think we can implement that:
we need an array of pairs (line number, number of lines) that keeps track of deleted lines. For example, if we have an align environnement of three lines starting at line 6, the pair would be (6,3).
we also need a map whose key is the line number and the value is an array of pairs which does the same thing as the previous array, but for characters.
With these two objects, I think we can do the change of coordinates for the diagnostics given by the linter.
I will try to implement that next week.
But I must say that vale is almost working straightforwardly for my latex documents...
I am not a fan of the output “writing-good” though ^^ I have to try other styles
Great to see you've started looking at this! I look forward to seeing more of your work.
Purpose of this line
vscode.commands.registerCommand('latex-utilities.code-action', (d, r, c, m) =>
extension.codeActions.runCodeAction(d, r, c, m)
)
Ot simply allows actions to be registered which call (from codeAction.ts
)
runCodeAction(document: vscode.TextDocument, range: vscode.Range, source: string, message: string)
How run-on-save is called
In main.ts
we have this block
vscode.workspace.onDidSaveTextDocument((e: vscode.TextDocument) => {
if (e.uri.fsPath === extension.completionWatcher.snippetFile.user) {
extension.completionWatcher.loadSnippets(true)
} else {
extension.wordCounter.setStatus()
if (utils.hasTexId(e.languageId)) {
extension.diagnoser.lintDocument(e) // <--- this line here
}
}
}),
This seems like the sort of thing which benefits a lot by chosing a good data structure for it. I'll have a think about it and let you know if I have any ideas different to yours.
How do you find the interface IDiagnosticSource
? (as implemented in vale.ts
). If you've got any feedback it would be best to make use of it now rather than later :)
( btw, I don't think we should need a vale path, because vale <arguments>
should work, but for some reason it didn't work on my machine )
Thank you for your answers! Are there any reasons why you need to check e.uri.fsPath === extension.completionWatcher.snippetFile.user
for calling the linter ?
I will wait for your feedback on the data structure before starting to implement it then ^^
IDiagnosticSource
looks nice, the only thing is that I would add somewhere is a way for the user to give its own args to each linter.
A remark I would make though is that you are always aiming to linters that we have via command line. But for example, Vale or LanguageTool both have a vscode extension, so may be it would be possible to use them ? It is just an idea.
I noticed that Vale has a lot of style to help with styles, but I am not sure it really helps with grammar. LanguageTool also has a java executable standalone here and it is available in a lot of languages, I suggest we add this one after Vale ^^
( btw, I don't think we should need a vale path, because
vale <arguments>
should work, but for some reason it didn't work on my machine )
I will remove it and let vale
instead then. I noticed I had to reload vscode the first time to update my PATH variable that must contain the path to 'vale'. Otherwise, try to output 'process.env'
I started to play around with LanguageTool, the JSON format is defined here, I will try to define the associated linter.
Let's see
e.uri.fsPath === extension.completionWatcher.snippetFile.user
doing?extension.completionWatcher.loadSnippets(true)
(live snippet stuff)\begin{enumerate}
"enumerate is too wordy", or insude a table word & word & word
"word is repeated" etc.Ok for e.uri.fsPath === extension.completionWatcher.snippetFile.user
, what does it have to do with linters ? Could we put the lines
if (utils.hasTexId(e.languageId)) {
extension.diagnoser.lintDocument(e) // <--- this line here
}
outside of this check ?
Could you show me where you found the documentation for the vscode diagnostic ? So that I can do the link between the vscode diagnostics and those of LanguageTool.
I have a first version of LanguageTool.ts, could you look at it ? The only issue was that they do not use position as (line,character), but only give an offset in the whole document. And of course, there are a lot of false positive due to latex :-)
LanguageTool offers a lot of outputs, I just used a small part for now... And I am yet to do the code-actions.
I just added the code actions, I learned a lot of typescript in a few hours xD
Last thing and then I stop spamming. The issue is that LanguageTool gives an offset and a length, considering the document as a big string. It is a recurring problem with VSCode so there are functions (positionAt
) associated with vscode.TextDocument
to go from this representation to using a pair of (line,offset). Problem: we need to call LanguageTool on vscode.TextDocument
after we remove all the latex, so I am not sure we will still be able to use positionAt
.
Great to see the work you've done, thanks for helping out with this!
document.tex
→ .document.txt
NodeError: stdout maxBuffer length exceeded
firstcurrentlyRunning
bool attribute so that multiple instances of a long(ish) running command aren't spawnedFurthermore to the NodeError: stdout maxBuffer length exceeded
error, I think the way to solve that is by switching from using execFile
to spawn
.
Great!
I think I also got NodeError: stdout maxBuffer length exceeded
, I guess this is because there are too many false positive with the latex environments. Once we will have solved the issue of removing all the latex, it should not appear again.
What async does ? It seems that it should be used with await usually. I agree the zcurrentlyRunning` idea!
I guess this is because there are too many false positive with the latex environments
This is just an issue with long outputs. Using spawn
we should be able to handle much larger outputs. I'll take a look at that.
What async does?
Afaik this should just help with it being non-blocking
I agree the
currentlyRunning
idea!
Shouldn't be too hard to implement :smiley:
Switched to spawn and implemented currentlyRunning
(as currentProcess
) in 75143f05fd18c35a666f88a949529deb6b68a3f4
Minor update: I tested my big-ish tex file and it all worked. Also had a think about the data structure and arrays of removed content (one for lines, on for characters) seems like a relatively decent approach to me.
I'm getting a really good feeling about this!
Hmmm. Another thought, since these tools run on the whole file (and can take a few seconds), it could be nice to make use of a stateful model. I.e. change a few words in a paragraph, just run tools on modified paragraph instead of the full document. Perhaps we could just do this in-between saves with a timeout :thinking:
Perfect!
If we can create another file to parse with the linter, it should be not too difficult. But I was afraid it would be too expensive. And where could we create this file ?
I will try what I suggested then ^^
Hmmm. Another thought, since these tools run on the whole file (and can take a few seconds), it could be nice to make use of a stateful model. I.e. change a few words in a paragraph, just run tools on modified paragraph instead of the full document. Perhaps we could just do this in-between saves with a timeout 🤔
or we can define a command so that the user decides when he wants to call the linters?
It would be really interesting to run it only on modified parts, as you said, it takes time. But may be it would be difficult to do it in a first time directly.
The best bet is probably just to create a file in the tmp folder. See the use of tmpdir
in tikzpreview.ts
.
Too expensive
I think the contrary. We can easily run multiple tools on the same file, and maybe even make incremental changes
define a command so that the user decides when he wants to call the linters It would be really interesting to run it only on modified parts, as you said, it takes time. But maybe it would be difficult to do it in the first time directly.
I think the first time we just run on the whole file, there's not really much else you can do. Then as part of the linter-config we could have a setting like
"latex-utils.diagnoser.tools": {
"languagetool": {
"enabled": true,
"args": [],
"run": "onSave" | "onSaveAndIncremental" | "incrementally"
}
}
or similar. Basicly I see there being three main modes:
Ok, I will look at tmpdir
then.
What do you mean by incremental changes ? Anyway, sounds great to me
By incremental changes I mean
- this is is a test
+ this is a test
then just rerunning the linters on the changed line.
Ok I see.
What I meant is that may be we should first implement the linter on the whole file, removing the latex and so on. And then, when it will work, we can try to implement the incremental feature, what do you think ?
That's what I was thinking :smiley: I just think it's good to keep these things in mind when implementing to we don't unintentionally make our lives harder later on.
On that note, with the LaTeX to plaintext conversion, I think I'd be good to use a very modular/versatile approach for the parsing. I'd like to do some things such as
\*ref{...}
tikzpicture
, minted
Given the existence of user-definable commands, I think it would be best to operate on a blacklist model. So I'd imagine the config would be split into two sections, commands and environments. Then for each item have a few properties that set ignoring/passthrough of some/all mandatory/optional arguments This would ideally be able to mix in / be replaced with a user config setting, any single JSON object should work.
From there the best thing to do would probably be just to compile a giant regex expression and run that against the document.
Those are my ideas, what do you think?
That's what I was thinking 😃 I just think it's good to keep these things in mind when implementing to we don't unintentionally make our lives harder later on.
I agree perfect!
On that note, with the LaTeX to plaintext conversion, I think I'd be good to use a very modular/versatile approach for the parsing. I'd like to do some things such as
Ignore arguments of specific commands — e.g. *ref{...} Ignore some environments, and their optional arguments — e.g. tikzpicture, minted Given the existence of user-definable commands, I think it would be best to operate on a blacklist model. So I'd imagine the config would be split into two sections, commands and environments. Then for each item have a few properties that set ignoring/passthrough of some/all mandatory/optional arguments This would ideally be able to mix in / be replaced with a user config setting, any single JSON object should work.
From there the best thing to do would probably be just to compile a giant regex expression and run that against the document.
I am not sure I understood everything.
Concerning commands, why do you want to only ignore arguments?
What are the properties you want for each item ?
In my mind I was just thinking of a list of commands and a list of environments as you say in a JSON files (so that a user can add his own). And then, for each command 'foo' we remove \foo[...]{...}
and for each environment 'bar', we remove \begin{bar}...\end{bar}
.
Then, there are some details of course (document
environment, itemize
/enumerate
environment that we can keep, $...$
, etc)
I feel as though I'm overthinking this I think (for commands) an array of regex is probably best.
Unfortunately due to nesting parenthesis (e.g. \graphicspath{{...}}
) we can't use just regex :(
To clarify on my terminology, I think of latex macros like this:
\command[optional1][optional2]{mandtory1}{mandatory2}
why do you want to only ignore arguments?
By that I mean that we want to ignore the \command
bit always, but sometimes we want to pass through an argument (e.g. \emph{...}
), sometimes we want to ignore it (e.g. \tikzset{...}
), then a similar thing with environments of course.
Not sure what the cleanest way of being able to represent differences is what we want done would be.
That is when we can look at what he has done in textidote ^^
Then, we can define a list of commands where we delete everything, a list of command where we keep the argument. And it is a similar for environments, but it is a little bit more difficult to keep the argument of an environment.
Perhaps then we just take the approach where, this may be a little heavy-handed, but it will be good for 99% of the document. Remove all optional arguments of environments, and then split commands into "keep arguments" and "remove arguments". That sounds reasonable to me.
@PierreMarchand20 how's it going?
I did not have the time to do it so far, may be this weekend.
Ok I looked a little bit, I think actually this easiest would be to get the text of the file in a string and call regex functions. I do not see the advantage of copy paste the text in an auxiliary file. What do you think ? We can still do every operations on a string and then create an auxiliary file to store the cleaned string if it is really necessary.
With this in mind I wanted to use the function matchall
(https://javascript.info/regexp-methods) which seems really perfect for us, but I cannot manage to use it. I added the necessary lib as mentioned here but still, when I compile, it does find the function. Any ideas ?
Feature Request
Is your feature request related to a problem? Please describe.
Since the issue has been closed in https://github.com/James-Yu/LaTeX-Workshop/issues/881#issuecomment-533659579, I open an issue here in case you are interested in adding more linters.
Describe the solution you'd like
As I said in the previous link, we could add TeXtidote which uses LanguageTools in latex documents. It just strips out latex from your document, calls LanguageTools which then checks spell and grammar for a lot of languages.
TeXtidote
then computes where are the locations of the errors/warning in your document.There was an issue (solved now) to have an easy-to-parse output (see here), so I guess it would not be too difficult to use.
It main be expensive to run it at every save for example, but having a command to call it would be nice.