lewang / ws-butler

Unobtrusively trim extraneous white-space *ONLY* in lines edited.
242 stars 26 forks source link

Deleting previous line causes next line to be cleaned even if not touched #7

Closed wildlava closed 10 years ago

wildlava commented 10 years ago

If I have a file with an existing line that has trailing whitespace, and I delete the line just above it, saving causes the untouched line below it to be cleaned of whitespace.

wildlava commented 10 years ago

A little debugging tells me that the problem lies in calculation of "beg" and "end". "end" gets set to the end of the following line (which is untouched) if there are no characters on the current line. I'm still looking into it...

lewang commented 10 years ago

This is probably an issue. IIRC when I did consciously decide to be more aggressive, but we can fix the case where "end" is a BOL.

wildlava commented 10 years ago

Note also that for each region in the hilit-chg buffer, ws-butler-map-changes always sets the next start to the previous end, so the real char position at the end of the region might be (end-1) rather than end. Also, the text-property-not-all function finds the first char that differs from "'hilit-chg", so it makes sense that this is one past the end of the region. I tried changing the following in ws-butler-before-save:

    (setq beg (progn (goto-char beg)
                      (point-at-bol))
           end (progn (goto-char (1- end))
                      (point-at-eol))))

i.e., I changed "end" to "(1- end)", and it seems to work in cases I have tried.

However, I am not sure why some cases make the end 1 greater. But they seem to be ones involving deleting (backspace). If I put "a" on the line before the one with trailing whitespace, things work now as before (end is 1 greater than start). If I type "a", then backspace, "end" is one greater than expected (also one greater than start). Same for "a", backspace, "a". In this case, end is 2 larger than start. In the "a" then backspace case, "prop" is hilit-chg-delete.

It seems to have to do with the hilit-chg buffer storing records of deletions. I'll look more into it. I am not sure if using end-1 is proper without knowing a bit more about how this works.

wildlava commented 10 years ago

Instead of the above, which would (I assume) erroneously decrement "limit" if the region is at the end (so "end" is nil), the following would be a better change:

-          (funcall func prop start (or end limit)))
+          (if end
+              (funcall func prop start (1- end))
+            (funcall func prop start limit)))

Let me know what you think.

lewang commented 10 years ago

The problem is that ws-butler relies on highlight-changes-mode, which works based on text properties; and only character can have text properties.

So in your example, if you kill one line, then the change is recorded as a single character change in the line after it. Now we cannot differentiate between an actual single character change against one caused by deletion. So we simply have to clean up the line.

I don't see a way around this.

The spirit of ws-butler is to clean up only lines we touch, I think it acceptable to consider the line immediately after one we kill to be "touched".

wildlava commented 10 years ago

I agree regarding killing a line, but my two examples did not kill a line, but rather just the contents of the line (i.e. they did not delete the newline).

Each line in the buffer has the characters on it plus a newline. So what I have seen so far is that it is the newline that gets the property "hilit-chg-delete" if characters on that line are deleted.

I believe that "end" points to the first character of the following region, not the end of the current one (this must be the case, since the loop in ws-butler-map-changes sets start to the previous end). I think this may be the cause of the issue.

For example, if I have trailing spaces on the line following an empty line, and I put "a" on the empty line, saving does the right thing. In my case, start=4 and end=5. The "a" character position (4) has "hilit-chg" as expected, and the end of the region is really the "a" (not the newline after), probably because no delete was performed. So the region's end is also really 4.

However, if I type "a" then backspace on the blank line, I still get start=4 and end=5. The only char there now is the newline, since the "a" is gone, and that newline (at location 4) has property "hilit-chg-delete". I assume this is to indicate that chars were deleted from this line. Still, the region should contain only the char at location 4 (the newline). Since the current code sets end to 5, which is the first character on the following line, that causes the issue.

If I type "a", then backspace, then "a", I get start=4 and end=6. There are now 2 characters in the region ("a" and newline), and "a" has the property "hilit-chg" (I did not yet check the newline property, but I would bet it is "hilit-chg-delete"). So again, end is one larger than it should be, pointing to the first char of the following line (which is in the following region).

Bottom line, I believe that decrementing end by one fixes this. However, I would like to confirm my theory that the newline is where the delete property is stored intentionally. Also, I am not sure if "limit" (which is used if end is nil) is one larger than the end or the actual end, so not sure if we should decrement that as well). Still experimenting...

wildlava commented 10 years ago

Further reading tells me that text property regions have start/end between chars, not on them. So if start=4 and end=6, there are two chars, one at 4 and one at 5. I assume "limit" works the same way, so decrementing "end" in the ws-butler-before-save func is probably correct:

         (setq beg (progn (goto-char beg)
                          (point-at-bol))
               end (progn (goto-char (1- end))
                          (point-at-eol))))
lewang commented 10 years ago

Further reading tells me that text property regions have start/end between chars, not on them.

Do you have a reference?

If I type "a", then backspace, then "a", I get start=4 and end=6.

I don't see this. I see a single character change on the character after "a"

wildlava commented 10 years ago

Check this link: http://www.gnu.org/software/emacs/manual/html_node/elisp/Positions.html#Positions

"More precisely, a position identifies the place between two characters (or before the first character, or after the last character), so we can speak of the character before or after a given position. However, we often speak of the character “at” a position, meaning the character after that position."

Also this link: http://www.gnu.org/software/emacs/manual/html_node/elisp/Point.html

"Like other positions, point designates a place between two characters (or before the first character, or after the last character), rather than a particular character. Usually terminals display the cursor over the character that immediately follows point; point is actually before the character on which the cursor sits."

And note here: http://www.gnu.org/software/emacs/manual/html_node/elisp/Property-Search.html

So the functions, e.g. text-property-not-all (used in ws-butler-map-changes), speak of characters "between start and end", which is what prompted me to learn more about how positions work. The characters are always between the positions, which removes ambiguity and makes sense of the doc for these functions.

Also, I experimented with debugging the blank line at EOB code to find out what "limit" and "end" mean in that context. It appears to be the same:

A character's position (or a region start) is right before the character at that position, but the end of a region (or limit) lies right after the last character.

As for how I saw start=4 and end=6, I set a debug break point right after these are set, which happens on save. I had a buffer that already had whitespace on the line below a blank line. I put a "a" on that blank line, hit backspace, erasing it, and hit "a" again. I got 4 and 6. So by the emacs definitions, this region bounds exactly 2 characters: the one after position 4 and the one before position 6 (or to say it another way: the char at pos 4 and the one at pos 5). The resulting clean cleans the line below the "a" as well. My fix fixes this and causes ws-butler to behave exactly like ws-trim in this respect.

lewang commented 10 years ago

I could not reproduce what you get with 4&6, but that may not be so important.

I'm not sure if we are on the same page. The fact is if you insert a single space, the changed region is x, x+1. If you delete a line, the change is y,y+1. There is no way to differentiate between the two.

wildlava commented 10 years ago

Here is the difference I saw:

If I insert just one "a" (or space or whatever), I get start=4 and end=5.

If I insert "a", backspace, and insert "a", I get start=4 and end=6.

In either case, the line has two characters: an "a" and a newline. But the properties of these two must be different, causing the different region size.

In any case, it really doesn't matter. The "end" value set should never enter the next line. Let's take the case of start=4 and end=6:

The code takes "end" as returned by the map-changes call and does a "goto-char", which in the case where the region is two characters places the position on the first character of the next line. It then does a "point-at-eol", placing region end at the end of the line. This next line then gets erroneously cleaned.

In reality, the "goto-char" should place the position on the newline, which is part of the original line, not the next. The "point-at-eol" then would just position at the end of that line, not the next.

The problem, I believe, is that "goto-char" puts the position right before the character, so if "end" is really after the newline, it ends up pointing to the next char, which is the first of the second line, so if we are treating position as a charactger position and not a region end (which is what "goto-char" is doing), we need to decrement the end by one before doing this.

To reproduce what I did, take the following file:


hi





The underscores above are spaces. The dashes just represent the start and end of the file. I created the file with another editor to intentionally have trailing whitespace. Now go tot he second line and add "a", backspace, "a". You'll see:


hi a





Note that the "h" is the first char. "a" is at 4 (be sure to count the newlines), and the next line's first space is at 6 (the previous line ends at position 6 as well, since positions are between chars). So in the case that the region is 4 to 6, the current code goes to position 6 (the char after pos 6 - the space on the next line), then goes to the end of line. This line gets cleaned when it should not.

If we set beg = 4 (start) and end = 5 (end - 1), we get the right behavior. It's all in how positions are defined. The context of position as after the last char does not equate to position representing the character after it. So depending on the function called and whether it is interpreting a position as pointing to a char or defining a region is why we need to decrement end (taking a region end interpretation and turning it into a value pointing to a character).

Sorry, this is hard to describe in text, I know.

wildlava commented 10 years ago

Hm, the file depictions in the above message didn't really show up correctly. Try these:

hi
____
____
____

And after the "a", backspace, "a":

hi
a
____
____
____
wildlava commented 10 years ago

OK, I know the above graphics are not too clear. Here's a better graphic...

| "h  | "i" | nl | "a" | nl | sp | sp | sp | sp |
1     2     3    4     5    6    7    8    9   10

In the case when the "a" is added on the second (blank) line, the region comes back as 4 - 5 (which contains only the "a"), so in the current code, the goto-char goes to position 5, which is the newline. Then it goes to the end of that line, which is stil the same line, and that line is cleaned. Works fine.

In the second case, when "a", backspace, "a" are typed on the second line, region 4 - 6 is returned, which contains the "a" and the newline. But goto-char, when going to position 6, goes to the space on the following line (since the characte is after its position). Then it goes to the end of that line and cleans it and the previous line, which is not intended.

In the first case, the last char of the region is "a", and in the second, it is the newline, which are character positions 4 and 5, respectively. This is why end needs to be set to end-1 to correctly goto-char the right character at the end of the region returned.

lewang commented 10 years ago

OK. I understand the second case now. This seems to be a bug in hilit-chg.el.

lewang commented 10 years ago

I've filed a bug (17784) in Emacs. In the mean time, your proposal of removing 1 from end seems sound.

lewang commented 10 years ago

Thanks for taking the extra time to explain the problem. It was much clearer with the illustrated example.

wildlava commented 10 years ago

Thanks! And we'll have to see what responses to the bug report will be. I do not know all the rules for the highlight mode, but I think you said the extra char was not marked with a property. If that is the case, it does sound like a bug. I had assumed that maybe they marked the newline with "hilit-chg-delete" (but I did not check) as a way of recording the fact that char(s) were deleted, since the case in which I hit just "a" then backspace (leaving nothing by the newline) showed the newline marked with hilit-chg-delete.

wildlava commented 10 years ago

Oh, and no matter what, I still believe decrementing end is correct due to the definition of positions in emacs, so even if they fix the bug (assuming it is a bug), I think you are doing the correct thing now.