Closed Zylence closed 9 months ago
In JabRef's localization, we keep the English keys equal to the English full text. In this way, we can have readabile code. Details at https://devdocs.jabref.org/code-howtos/localization.html.
Regarding the SnuggleTeX errors, it would be OK to show "LaTeX cannot be parsed" and have the detail message in English only. These are all very technical terms and I thnk, no JabRef tranlsator should be bothered to translate them.
To workaround the method-too-large exception, I am trying to include our custom JDK build.
Thank you for having worked on this. Sorry for the late reply. Hope, you will continue on this!
Some commands may be missing, for example I found
\text{}
to be absent,
\text appears in math-mode only. Not outside math mode. Maybe, it is available inside math mode (e..g, $\text{test}$).
I have not checked if this is the correct way to represent the text command, but now it parses it correctly.
Which concrete case did you have?
What it does not do
Please unzip what you mean.
Ah, I understand: It does not check for ampersands, wrong bibtex string concatenation, and percent signs.
For BibTeX string concatenation, we already discussed. Difficult thing as this is JabRef custom logic, too.
What we could do
1. Use the tokens provided by snuggletex to implement our own parser on top **Or**
Could be a fun excercise? With the possible outcome that the code is harder to read as our existing code?
What I would like to do
I am really fascinated by this library, it's clean, well documented and build thoughfully and extendible. I'd really like to do more with it. If you do not mind, I'd like to port our integrity checks to snuggletex, rather than writing them as we used to.
Suggestion: Finish this PR with the current functionality. - Create a new branch (based on this branch) to rewrite the other checks. In this way, you can work in parallel. If the updated code turns out to be good, we can continue working forward to include it. If it turns out it is not that maintainable, the issue https://github.com/JabRef/jabref/issues/8712 is still fixed.
In JabRef's localization, we keep the English keys equal to the English full text. In this way, we can have readabile code. Details at https://devdocs.jabref.org/code-howtos/localization.html.
Regarding the SnuggleTeX errors, it would be OK to show "LaTeX cannot be parsed" and have the detail message in English only. These are all very technical terms and I thnk, no JabRef tranlsator should be bothered to translate them.
Okay, just to make sure I understand you correctly, a message would for example look like this:
LaTeX cannot be parsed: Nothing following \
And we would only translate this part: LaTeX cannot be parsed.
So having Nothing following \
inside the properties file is no necessary?
Consequently we would only need one entry inside the properties file:
LaTeX\ cannot\ be\ parsed:=LaTeX cannot be parsed: %0
and "feed it" the internal error messages.
To workaround the method-too-large exception, I am trying to include our custom JDK build.
Sounds great, I am looking forward to trying it out :^)
Thank you for having worked on this. Sorry for the late reply. Hope, you will continue on this!
No worries, I was very busy anyway. I would be happy to continue :^) .
\text appears in math-mode only. Not outside math mode. Maybe, it is available inside math mode (e..g, ).
It really is not handled. I looked through the library with text search and tried it out. \text is an undefined command according to snuggletex.
Ah, I understand: It does not check for ampersands, wrong bibtex string concatenation, and percent signs.
Exactly, could have made this clearer, sorry for the confusion.
Could be a fun excercise? With the possible outcome that the code is harder to read as our existing code?
Just guessing, I'd say it won't make much of a difference. Perhaps we should just be happy with what we already have. I will do a case study regardless, so we can see the possible outcome before deciding.
Suggestion: Finish this PR with the current functionality. - Create a new branch (based on this branch) to rewrite the other checks. In this way, you can work in parallel. If the updated code turns out to be good, we can continue working forward to include it. If it turns out it is not that maintainable, the issue https://github.com/JabRef/jabref/issues/8712 is still fixed.
Sounds good to me, I'll do it. :)
Okay, just to make sure I understand you correctly, a message would for example look like this:
LaTeX cannot be parsed: Nothing following \
And we would only translate this part:
LaTeX cannot be parsed.
So having
Nothing following \
inside the properties file is no necessary?
Yes. We should move forward in that way. Later, we should collect (somehow) the returned errors and translate them. We have telemetry infrastructure for that, but currently not working. Needs some updates...
Consequently we would only need one entry inside the properties file:
LaTeX\ cannot\ be\ parsed:=LaTeX cannot be parsed: %0
The %0 needs to be on the left side, too. Key and value really need to be the same 😅
and "feed it" the internal error messages.
Yes!
Thank you for having worked on this. Sorry for the late reply. Hope, you will continue on this!
No worries, I was very busy anyway. I would be happy to continue :^) .
Looking forward 🤩
\text appears in math-mode only. Not outside math mode. Maybe, it is available inside math mode (e..g, ). It really is not handled. I looked through the library with text search and tried it out. \text is an undefined command according to snuggletex...
Maybe worth a PR for SnuggleTex? 😅
Could be a fun excercise? With the possible outcome that the code is harder to read as our existing code? Just guessing, I'd say it won't make much of a difference. Perhaps we should just be happy with what we already have. I will do a case study regardless, so we can see the possible outcome before deciding.
👍
Later, we should collect (somehow) the returned errors and translate them. We have telemetry infrastructure for that, but currently not working. Needs some updates...
Do you mean to translate them automaticly, via a service?
The %0 needs to be on the left side, too. Key and value really need to be the same 😅
Slip of the pen. 😅
Yes!
Errors are now prefixed with "LaTeX Parsing Error:" I found that quite appealing, but we can change that back to "LaTeX cannot be parsed" if you like.
Maybe worth a PR for SnuggleTex? 😅
I can try, but do you think it will be merged? I saw your PR in the repo is still dangling around unmerged. 😅
Please add test cases - and then this should be good to go for a review at the jabref repo
Tests will be ready over the next week. :)
Code looks pretty readable.
Thanks, just did some performance oriented refactoring regardless, but that should not have too big of an effect on readability. Engine and Session are now static members, prior to that we instantiated them per bib entry. (Keeping the references aroung only adds ~1 KB of memory overhead)
Later, we should collect (somehow) the returned errors and translate them. We have telemetry infrastructure for that, but currently not working. Needs some updates... Do you mean to translate them automaticly, via a service?
No ^^. Here my line of thought:
The translation will be done as usual using JabRef_en.properties
.
Errors are now prefixed with "LaTeX Parsing Error:" I found that quite appealing,
Reas good!
Maybe worth a PR for SnuggleTex? 😅 I can try, but do you think it will be merged? I saw your PR in the repo is still dangling around unmerged. 😅
@davemckain is the original developer on snuggletex. My bet is that he is happy if several people contribute to his repository - https://github.com/davemckain/snuggletex.
Note to self: The other "maintained" fork seems to be https://github.com/rototor/snuggletex. I would, however, like to stick to the "original" one ^^.
Code looks pretty readable.
Please add test cases - and then this should be good to go for a review at the jabref repo
Test cases have been added in this commit. Some that I expected to work did not (due to snuggletex) these are commented out for now.
Code looks pretty readable. Please add test cases - and then this should be good to go for a review at the jabref repo
Test cases have been added in this commit. Some that I expected to work did not (due to snuggletex) these are commented out for now.
Well that was unfortunate. I accidentally closed the PR in this comment, sorry.
Later, we should collect (somehow) the returned errors and translate them. We have telemetry infrastructure for that, but currently not working. Needs some updates... Do you mean to translate them automaticly, via a service?
No ^^. Here my line of thought:
- The number error messages is greater than ten.
- We do not know if we need to translate them all.
- How to find out which messages should be translated?
- Idea: Use telemtry! Collect the send error messages centrally.
- After some time, we know the set of tuples (error message, number of occurrence)
- These can then be translated.
The translation will be done as usual using
JabRef_en.properties
.Errors are now prefixed with "LaTeX Parsing Error:" I found that quite appealing,
Reas good!
Oh, okay, now I get it - I did not know you had telemetry infrastructure for that kind of thing. Thanks for letting me know. I agree, narrowing down the selection of messages that need translation in advance is the better choice.
This should now be a PR to JabRef's main repo. I resolved the merge conflicts at https://github.com/koppor/jabref/commit/4fb512e76bb441c299702e3382a81e90789200ff.
Should go as one commit in the upstream repo - if possible.
This should now be a PR to JabRef's main repo. I resolved the merge conflicts at 4fb512e.
Should go as one commit in the upstream repo - if possible.
Thank you, I am happy to hear that. I am excited to see how it will do in the wild! :d
Submitted https://github.com/JabRef/jabref/pull/10376 - therefore closing this.
Snuggletex
Snuggletex is a library from the University of Edinburgh for converting latex to XML, but can be used for latex parsing as well. It is extendible, easy to use and powerful, all whilst containing almost no external dependencies. In the future, it could become our main latex parser for integrity checks.
What it does do
Takeaways
Some commands may be missing, for example I found
\text{}
to be absent, to check which commands are supported by default, refer here: CorePackageDefinitions.java Thankfully, enough of the package is exposed to be able to inject new commands, like so for example:engine.getPackages().get(0).addComplexCommandOneArg("text", false, ALL_MODES,LR, StyleDeclarationInterpretation.NORMALSIZE, null, TextFlowContext.ALLOW_INLINE);
I have not checked if this is the correct way to represent the text command, but now it parses it correctly.
What it does not do
What we could do
Use the tokens provided by snuggletex to implement our own parser on top
Or
Keep our integrity checks for & and # and implement % like we used to
What I would like to do
I am really fascinated by this library, it's clean, well documented and build thoughfully and extendible. I'd really like to do more with it. If you do not mind, I'd like to port our integrity checks to snuggletex, rather than writing them as we used to.
Mandatory checks
CHANGELOG.md
described in a way that is understandable for the average user (if applicable)