Closed joostkremers closed 7 years ago
I'm not experiencing any notable slowdowns even in large org files. how-many
is just counting matches of a regular expression and that shouldn't be slow even with larger texts. I think there may be another issue that is causing these slowdowns. We could certainly introduce a work-around (e.g. use either the paragraph or the last N words, whatever is shorter) but it would be better to track down the true cause of this issue. The fact that you experienced these slowdowns only recently also suggests that the problem may not be rooted in guess-language because I didn't make any changes over the last two months.
I'm not experiencing any notable slowdowns even in large org files.
Yeah, I was afraid of that. ;-)
BTW, this is with org 9.0.5. Not sure if that makes a difference.
how-many
is just counting matches of a regular expression and that shouldn't be slow even with larger texts. I think there may be another issue that is causing these slowdowns.
My investigations suggest otherwise. I added a call to message
to guess-language-region
to output the beginning and end of the region, and I generally get fairly outrageous numbers, such as guess-language-region entered: 11100 54425
. Plus the profiler output suggests that how-many
is indeed slow when run against such a large region.
Note that this is primarily a problem in tables, though it there is also some slowdown when navigating collapsed headers. In normal text, it isn't nearly as bad.
The fact that you experienced these slowdown's only recently also suggests that the problem may not be rooted in guess-language because I didn't make any changes over the last two months.
Well, I haven't been using any Org files for the past two months, so that may also be a reason. ;-)
The point is that disabling guess-language-mode
makes the problem go away. Also, after changing the definition of guess-language
to:
...
(let ((beginning (max 0 (- (point) 100)))
(end (min (point-max) (+ (point) 100))))
...
the slowdowns disappeared.
So, I believe the facts clearly indicate that guess-language-region
should not be called on a region that is too large, and 40000+ characters is certainly too large. It could well be that there is some idiosyncrasy in my Org files that makes guess-language
use such large regions, but regardless, I think it would be best if guess-language-region
would guard against being called on regions that are too large.
Sorry, I should have been more precise. I wasn't suggesting that this slowdown has nothing to do with guess-language at all. Clearly the long processing time is incurred in how-many which is called by guess-language. However, that doesn't mean that the cause of the problem is necessarily in guess-language. If we get outrageous number for the start and end of the paragraph that suggests that the problem is probably in the code for detecting paragraphs, i.e. outside guess-language's responsibility. This is also supported by the fact that you still have outrageous numbers even when you're using org-backward-paragraph
and org-forward-paragraph
. Unless your document consists of one or multiple gigantic paragraphs that shouldn't happen.
If there is a problem with paragraph detection, that should be fixed first, not at least because it will likely create other problems than just slowing down guess-language. You say that guess-language should guard against using regions that are too large, but that's exactly what I wanted to achieve when I decided to do detection on a by-paragraph basis. Perhaps, additional guards are necessary but I wouldn't want to add a workaround for an issue that may be specific to your particular setup. So let's find out why you're getting these ridiculously long paragraphs and we can go from there.
First off,, I should note that when I tried using org-(forward|backward)-paragraph
in guess-language
to remedy my problem, I changed the wrong function (guess-language-paragraph
rather than guess-language
). If I change the right function, the problem is indeed solved.
Basically, what happens is this: if you have something like the following in an Org file:
* Some Heading
Some text.
** Another Heading
| A | Table |
|---|-------|
| | |
and point is somewhere in the table, then backward-paragraph
moves to the beginning of "Some text"
. But if instead you have:
* Some Heading
- Some list item
** Another Heading
| A | Table |
|---|-------|
| | |
then backward-paragraph
will skip over "- Some list item"
.
Crucially, backward-paragraph
also skips over headings and the beginning of tables. So in this particular example, backward-paragraph
would move point to the beginning of the buffer.
The Org-specific functions, org-(forward|backward)-paragraph
do stop at headings, list items and tables. So in the second example, with point being in the table, org-backward-paragraph
moves point to the beginning of the table.
This means that if you have an Org file that consists solely of headings, lists and tables, backward-paragraph
will move point to the beginning of the buffer. And it just so happens that this is the case in the Org file I'm experiencing slowdows in. It doesn't have much "normal" text (or even none at all), so every time guess-language
is called, it basically checks the entire buffer.
In other words, not all text formats can use the default (forward|backward)-paragraph
functions. Given that fact, I think (forward|backward)-paragraph
aren't the best choice to ensure that the region to be checked isn't too large.
So could we fix this by simply using org-backward-paragraph
instead of backward-paragraph
and likewise for forward-paragraph
whenever we're in an org buffer?
So could we fix this by simply using
org-backward-paragraph
instead ofbackward-paragraph
and likewise forforward-paragraph
whenever we're in an org buffer?`
For Org buffers that should be enough, yes. There's of course the theoretical consideration that other text modes may also not work with the standard (forward|backward)-paragraph
functions, but if such an issue ever comes up, you could deal with it then.
See 2fd50238e1b30603754497195b6411c8996cb769 and let me know if this works for you. I have to say, I'm not sure that this is the correct solution. The granularity is now really fine. For example, each item in a list is a paragraph and most of my list items do not provide enough material for reliable language identification. So we may have to change this to something more sophisticated, e.g., use backward/forward-paragraph unless gives us an insanely large region and only then fall back to org-backward-paragraph. Or use the default paragraph in org but never more than the current subtree.
That should work, I currently have something similar in a local copy of guess-language
. But you're right that the granularity is probably too small now. I had another approach at first:
https://github.com/joostkremers/guess-language.el/commit/b2474dbf301249c4337bc8b3fb9cbb6bc383bce7
Something like that could perhaps be combined with (forward|backward)-paragraph
to guard against backward-paragraph
moving too far.
You're using the last 100 characters, right? By-paragraph language detection is a feature and this would break it. Specifically, when point is at the beginning of a paragraph you would effectively guess the language of the last paragraph not the current one. 100 chars may also be too little for a reliable guess. I think what I will do is to use org-backward/forward-paragraph
unless we're in a list in which case I will use org-beginning/end-of-item-list
; something like that.
Yeah. I was thinking you could do something like
(max (save-excursion (backward-paragraph) (point)) (- (point) 100))
...
Or whatever value instead of 100 would make sense. But that would still mean that larger paragraphs aren't tested entirely. Not user if that's an issue.
The paragraph language would depend on the position of point and I'm not sure that is a desirable property. The main language of a paragraph is what it is independently of the point.
I changed the code such that org lists are treated as paragraphs: 8c8a1616b6a7bc4c10942ee0a1b2591b98fcd493 That should work ok in most practical cases.
Thanks. There don't seem to be any slowdowns. I'll let you know if I run into any trouble, but I think we can close this.
Sorry for bringing up this issue again, but I am experiencing the slowdown in Org buffers.
I am using:
Disabling guess-language-mode
in the Org buffer removes the slowdown. Also, it happens every time, without taking into consideration buffer size.
If you need me to do some tests or provide extra information, please feel free to ask.
If it happens even in small buffer, this is probably a different problem, but let's find out. Could you please use the function below to see what region guess-language is using for detection?
(defun guess-language-current-region ()
(let ((beg (save-excursion (guess-language-backward-paragraph) (point)))
(end (save-excursion (guess-language-forward-paragraph) (point))))
(move-overlay mouse-secondary-overlay beg end)
(message (format "Region beg: %d Region end: %d Region length: %d" beg end (- end beg)))))
Place point on a paragraph where you experience slow detection and then call this function. It shows the region coordinates in the mini buffer and also highlights the region that would be used for detection. Are these regions excessively large?
FWIW, I'm running the latest version of guess-language
as well and haven't seen any slowdowns anymore. So it most likely is a different issue.
@tmalsburg this is an example, if you need more just ask:
Region beg: 785 Region end: 802 Region length: 17
@joostkremers I'm running latest guess-language
as well, but only when I disabled it in the current buffer the slowdowns don't happen.
So guess-language is running on just 17 characters. Since this is shorter than the minimal paragraph length (
guess-language-min-paragraph-length
), guess-language should actually not do anything at all. Could you run M-x guess-language
at the same position and check how long that takes?
It hangs indefinitely.
Ah, I think it's a corner case that I forgot to handle in the latest commit. Do you have a list at the very beginning of the document?
Yes, and also throughout the whole document. An excerpt:
* Super to open Dash
- sudo apt remove dell-super-key
* Swap TAB with CTRL
** dconf > org > gnome > desktop > input-sources
- set xkb-options to ['ctrl:swapcaps']
Hm, there clearly is a bug that is triggered when you have a plain list at the very beginning of the document. However, later in the document this bug shouldn't cause any problems. So if you experience this issue everywhere in the document, there must be something else going on. It's going to be difficult to track this down if I can't reproduce the problem. Could you please try to come up with a minimal working example that reproduces the problem (emacs -q ...
)?
The hang-up in buffer-initial plain lists should be fixed now. 2bc0e1f9c8947b9b5ac8d792bd7f6d2c36d294ab
Thanks for the commit and the explanation.
I can't reproduce it using emacs -Q
, so it's definitely something in my configuration. Consider this close, then, and thanks again for the kind support.
Well, it's still possible that guess-language interacts with other packages in a very unfortunate way. In this case, we should make changes to prevent this from happening. So if you find out what's going on, please let me know. Thanks! (Closing for now.)
Hi,
I've been experiencing a terrible slow-down in Org buffers recently, especially when inside tables, but also just moving the cursor around partially collapsed headings. A quick profiling showed that
guess-language
is the culprit, especially the call tohow-many
inguess-language-region
:It seems that the longer the Org file, the bigger the slow down. I'm guessing that this may be caused by the fact that
backward-paragraph
in an Org buffer may travel very far back: in one particular Org file of mine, it moves almost all the way to the beginning of the buffer.I tried the obvious thing, i.e., use
org-backward-paragraph
andorg-forward-paragraph
inguess-language-paragraph
ifmajor-mode
isorg-mode
, but that didn't seem to have much of an effect. Perhaps you know of a better way to deal with the issue?