Closed marcoagpinto closed 9 months ago
@marcoagpinto: Do you use LibreOffice 7.6? This version uses a new feature, that should make the generation of cache easier and faster (not slower). Could you please test, if the same problems happen, if you save your document as ODT, close LO and open the document again? (if not, it is a problem with the DOCX format.) Could you reproduce the problem with an easy DOCX-document? Maybe tomorrow I can take a look at the problem. After that, I'll be on vacation for two weeks.
Yes, I have the latest LibreOffice, 7.6.0.3, with Windows 11.
After installing the nightly, I turned on some rules for pt-PT.
The thesis was already in .odt when I sent you the log.
First, I opened the .docx, saved as .odt, closed LibreOffice, deleted the LanguageTool log and opened the .ODT.
After some 20+ minutes of wait, the scroll was working fine, then I disabled the rule that complains about the paragraph length.
Then I was scrolling down and it was green lining some words.
I kept scrolling and clicking on the green lines, until it stopped showing the pop-up menu options (very old bug?).
So, I went to the add-on toolbar and clicked on refresh the text analysis, then it became again slow as hell.
So, I closed LibreOffice and pasted the log here.
Does this help?
Thanks!
@marcoagpinto In the current developer version (6.4) I have introduced a number of improvements that particularly affect performance. Would you like to test the version with your document and give me feedback?
@FredKruse
Sure, I will do it tonight.
Last week or so (I can't remember) I unzipped the .oxt and changed all “temp_off” rules to “on” and replaced goal-specific keywords with a space, to use all Portuguese rules, and while scrolling down the thesis I was getting tons of Java errors.
@FredKruse
I am very sad with the latest nightly.
I downloaded it, installed it, opened a small Portuguese .ODT, and then I went to the folder settings and deleted the cache + config + log, and clicked on the button to default the settings and turned on the multicore use option.
The files appeared again as expected.
I closed the latest version of LibreOffice and also the quickstart icon.
I opened my thesis in .DOCX, set the whole document language to Portuguese and saved as .ODT.
As I opened the thesis, there was no longer the LanguageTool toolbar, and the log file had errors in it:
LT office integration log from Thu Jan 04 04:20:26 GMT 2024
LanguageTool 6.4-SNAPSHOT (2024-01-03 17:52:33 +0000, 3b06ad5)
OS: Windows 11 10.0 on amd64
LibreOffice 7.6.4.1 (The Document Foundation), en-GB
Java-Version: 1.8.0_391, max. Heap-Space: 7262 MB, LT Heap Space Limit: 6536 MB
MultiDocumentsHandler: getLinguisticServices: linguServices set: is NOT null
CacheIO: getCachePath: cacheFileName == null!
MultiDocumentsHandler: getNumDoc: Document 0 created; docID = 1
SingleDocument: writeCaches: Copy DocumentCache
SingleDocument: writeCaches: Copy ResultCache 0
SingleDocument: writeCaches: Copy ResultCache 1
SingleDocument: writeCaches: Copy ResultCache 2
SingleDocument: writeCaches: Copy ResultCache 3
SingleDocument: writeCaches: Save Caches ...
SingleDocument: writeCaches: Copy DocumentCache
SingleDocument: writeCaches: Copy ResultCache 0
SingleDocument: writeCaches: Copy ResultCache 1
SingleDocument: writeCaches: Copy ResultCache 2
SingleDocument: writeCaches: Copy ResultCache 3
SingleDocument: writeCaches: Save Caches ...
java.lang.NullPointerException
at org.languagetool.openoffice.FlatParagraphTools.getAllFlatParagraphs(FlatParagraphTools.java:309)
at org.languagetool.openoffice.DocumentTextCache.refreshWriterCache(DocumentTextCache.java:262)
at org.languagetool.openoffice.DocumentTextCache.refresh(DocumentTextCache.java:177)
at org.languagetool.openoffice.DocumentCache.refresh(DocumentCache.java:67)
at org.languagetool.openoffice.DocumentTextCache.refreshAndCompare(DocumentTextCache.java:2093)
at org.languagetool.openoffice.CheckRequestAnalysis.handleCacheChanges(CheckRequestAnalysis.java:650)
at org.languagetool.openoffice.CheckRequestAnalysis.getNumberOfParagraphFromSortedTextId(CheckRequestAnalysis.java:115)
at org.languagetool.openoffice.SingleDocument.getCheckResults(SingleDocument.java:327)
at org.languagetool.openoffice.SingleDocument.getCheckResults(SingleDocument.java:185)
at org.languagetool.openoffice.MultiDocumentsHandler.getCheckResults(MultiDocumentsHandler.java:264)
at org.languagetool.openoffice.MultiDocumentsHandler.doProofreading(MultiDocumentsHandler.java:202)
at org.languagetool.openoffice.Main.doProofreading(Main.java:80)
WARNING: DocumentCache: refresh: paragraphContainer == null - ParagraphCache not initialised
XDrawPageSupplier == null
WARNING: DocumentCache: refresh: paragraphContainer == null - ParagraphCache not initialised
CacheIO: CacheCleanUp: Remove Path from CacheMap: /C:/XXXXXXX/LANGUAGETOOL TESTS/Sample doc 20230102.odt
CacheIO: CacheCleanUp: Remove Path from CacheMap: /C:/XXXXXXX/LANGUAGETOOL TESTS/PhD_thesis_marcoagpinto_IST_1Main_V0092unsent.odt
MultiDocumentsHandler: getNumDoc: Document 1 created; docID = 2
Also, could you please add a button "Enable all Temp_off rules" so that I don't need to manually enable one by one, or having to unzip the oxt a do a replacement of all "temp_off" with "on"?
Thanks!
@FredKruse
Ahhh… look what just happened, which I last week thought it was because I replaced all “temp_off” with “on” but this time I didn't unzip the oxt:
@FredKruse
I am very sad with the latest nightly.
I downloaded it, installed it, opened a small Portuguese .ODT, and then I went to the folder settings and deleted the cache + config + log, and clicked on the button to default the settings and turned on the multicore use option.
The files appeared again as expected.
I closed the latest version of LibreOffice and also the quickstart icon.
I opened my thesis in .DOCX, set the whole document language to Portuguese and saved as .ODT.
As I opened the thesis, there was no longer the LanguageTool toolbar, and the log file had errors in it:
LT office integration log from Thu Jan 04 04:20:26 GMT 2024 LanguageTool 6.4-SNAPSHOT (2024-01-03 17:52:33 +0000, 3b06ad5) OS: Windows 11 10.0 on amd64 LibreOffice 7.6.4.1 (The Document Foundation), en-GB Java-Version: 1.8.0_391, max. Heap-Space: 7262 MB, LT Heap Space Limit: 6536 MB MultiDocumentsHandler: getLinguisticServices: linguServices set: is NOT null CacheIO: getCachePath: cacheFileName == null! MultiDocumentsHandler: getNumDoc: Document 0 created; docID = 1 SingleDocument: writeCaches: Copy DocumentCache SingleDocument: writeCaches: Copy ResultCache 0 SingleDocument: writeCaches: Copy ResultCache 1 SingleDocument: writeCaches: Copy ResultCache 2 SingleDocument: writeCaches: Copy ResultCache 3 SingleDocument: writeCaches: Save Caches ... SingleDocument: writeCaches: Copy DocumentCache SingleDocument: writeCaches: Copy ResultCache 0 SingleDocument: writeCaches: Copy ResultCache 1 SingleDocument: writeCaches: Copy ResultCache 2 SingleDocument: writeCaches: Copy ResultCache 3 SingleDocument: writeCaches: Save Caches ... java.lang.NullPointerException at org.languagetool.openoffice.FlatParagraphTools.getAllFlatParagraphs(FlatParagraphTools.java:309) at org.languagetool.openoffice.DocumentTextCache.refreshWriterCache(DocumentTextCache.java:262) at org.languagetool.openoffice.DocumentTextCache.refresh(DocumentTextCache.java:177) at org.languagetool.openoffice.DocumentCache.refresh(DocumentCache.java:67) at org.languagetool.openoffice.DocumentTextCache.refreshAndCompare(DocumentTextCache.java:2093) at org.languagetool.openoffice.CheckRequestAnalysis.handleCacheChanges(CheckRequestAnalysis.java:650) at org.languagetool.openoffice.CheckRequestAnalysis.getNumberOfParagraphFromSortedTextId(CheckRequestAnalysis.java:115) at org.languagetool.openoffice.SingleDocument.getCheckResults(SingleDocument.java:327) at org.languagetool.openoffice.SingleDocument.getCheckResults(SingleDocument.java:185) at org.languagetool.openoffice.MultiDocumentsHandler.getCheckResults(MultiDocumentsHandler.java:264) at org.languagetool.openoffice.MultiDocumentsHandler.doProofreading(MultiDocumentsHandler.java:202) at org.languagetool.openoffice.Main.doProofreading(Main.java:80) WARNING: DocumentCache: refresh: paragraphContainer == null - ParagraphCache not initialised XDrawPageSupplier == null WARNING: DocumentCache: refresh: paragraphContainer == null - ParagraphCache not initialised CacheIO: CacheCleanUp: Remove Path from CacheMap: /C:/XXXXXXX/LANGUAGETOOL TESTS/Sample doc 20230102.odt CacheIO: CacheCleanUp: Remove Path from CacheMap: /C:/XXXXXXX/LANGUAGETOOL TESTS/PhD_thesis_marcoagpinto_IST_1Main_V0092unsent.odt MultiDocumentsHandler: getNumDoc: Document 1 created; docID = 2
Also, could you please add a button "Enable all Temp_off rules" so that I don't need to manually enable one by one, or having to unzip the oxt a do a replacement of all "temp_off" with "on"?
Thanks!
@marcoagpinto The error looks really serious and seems to be deeply rooted inside the extension. I urgently need the document that is causing this issue. I cannot reproduce the error with my test files. Could you provide it to me? All other errors could be secondary errors.
@FredKruse
Sent!
Please DON'T share the document with anyone, as it is highly classified.
I will only use the document for testing and delete it afterward.
@FredKruse
Ahhh… look what just happened, which I last week thought it was because I replaced all “temp_off” with “on” but this time I didn't unzip the oxt:
I found the bug for this. It is solved, know (next nightly).
The whole document was analyzed without problems in my tests. But it took a very long time (20 minutes on my laptop). One point is, that you use many English words in your text. All are marked as spelling errors. The extension uses LT as spell checker since 6.3. It runs very well if there are a few dozens of spelling errors in the text. If there are hundreds of spelling errors in the text, it slows down the check very much.
If you agree, I'll do a few more tests before deleting the document. Maybe I can find a way to speed up the checks.
@FredKruse
Sure, do as many tests as possible.
Thanks!
😋 😋 😋 😋 😋 😋
@marcoagpinto I added an option in the option dialog (general) to enable temporary disabled rules. It is turned off by default. You have to switch it on. But the configuration will be saved. Please test it at the next nightly.
@FredKruse
Thank you a lot!
❤️ ❤️ ❤️ ❤️ ❤️ ❤️ ❤️
@marcoagpinto The latest nightly also includes changes that speed up the LT checks by about 25%. Unfortunately, I have no idea of any other changes regarding the problem.
@FredKruse
Thank you, Fred, I still haven't tested it because @p-goulart committed changes in multiwords working after the release hour and only today they will come in the nightly (and I want to download the latest stand-alone tool + Wikipedia tool with those changes working).
My idea to speed up hundreds of words that appear as typos would be to store them in a dynamic array and check each word of a document there before checking in the Hunspell dictionaries… as the years go by, I get more and more ideas for complex algorithms.
@FredKruse
EDIT: For example: “Sou perito in fyrmware e cryptoware e fyrmware”.
While checking the document word by word, first check the word in a dynamic typo dictionary for that document.
If “fyrmware” isn't in the Hunspell dictionary, it would be added to a custom typo dictionary.
It would continue word by word, but now checking first in the typo dictionary if is it != 0 words in it.
If the words are in the typos array, it is underlined as a typo and breaks the loop from checking the in typos array, and DOESN'T check in the Hunspell dictionaries.
This should increase speed a lot.
@marcoagpinto the improvements to the dictionary include some language-independent elements. Hopefully everything is reviewed and merged on Monday, but it could also take a little while longer. Those changes are not live.
As for your idea of keeping a separate 'typo' dictionary, I'm not that convinced there would be a considerable performance boost. It is true that fetching suggestions is the more time-consuming part of the process, so any kind of bootstrapping we can add would be great. Note, though, that Morfologik already has an .info
file that we can use to prioritise specific matches over others, and it already contains a rule for y -> i
. In my experience it has sped up the spellchecking process somewhat for the most frequent typos.
@FredKruse
Heya!
Fantastic work!
It took around 2m22seconds to check the whole thesis on my 9th generation i7 laptop.
LibreOffice crashed when I created a blank document, but it can be a LibreOffice bug (the GUI became black).
Thank you!
@marcoagpinto Can we close this issue?
Heya, @FredKruse
I have been using the Microsoft Word 365 add-on to revise my 633-page thesis.
However, today I opened it with LibreOffice and the latest nightly extension.
It is terribly slow analysing the text, and the log file throws several errors: