zyedidia / micro

A modern and intuitive terminal-based text editor
https://micro-editor.github.io
MIT License
24.39k stars 1.16k forks source link

Garbled contents when reloading file with DOS line endings #3303

Open scurest opened 1 month ago

scurest commented 1 month ago

Description of the problem

When reloading a DOS-mode file (CRLF line endings) that has changed on disk, the file contents are garbled.

Steps to reproduce

To help repro, here's a Python script that writes a random test.txt with DOS line endings.

# a.py
import random, sys
random.seed(sys.argv[1])
with open('test.txt', 'w') as f:
    for _ in range(6):
        f.write(str(random.randint(0,999)))
        f.write("\r\n")

I run python a.py 1 and then open the test file with micro test.txt.

Then I run python a.py 2 to change the file. Micro prompts me to reload: The file on disk has changed. Reload file? (y,n,esc). I hit y. I get

Note that this is not a display issue, the actual contents are garbled. You can keep repeating this, python a.py 3 and so on.

Note that this does NOT happen if a.py is changed to write \n line endings AND if micro initially opened the file in unix mode.

Specifications

Commit hash: 68d88b57 OS: Arch Linux Terminal: xterm 390-2

Gavin-Holt commented 1 month ago

Hi

I noted this problem a while back (https://github.com/zyedidia/micro/issues/2756), but could not make it reproducible.

At that time I assumed it was due to my inferior OS!

We should be careful not to confuse reopen and reload these are different commands:

Confusingly, settings.json includes an option named reload:

I would be interested in a fix, so my spellchecking function (https://github.com/zyedidia/micro/discussions/2744) correctly reloads altered files.

Kind Regards Gavin Holt

dmaluka commented 1 month ago

FWIW I can easily reproduce it, with the newest micro from master.

  1. open a file with DOS line endings (\r\n)
  2. open the same file in another editor (or even in another instance of micro), add any text somewhere in the middle of the file, and save it
  3. reload the file (i.e. answer "yes" to the reload prompt) in the original instance of micro => result: the new text is inserted in the wrong place.

It's clearly a bug, and hopefully easy to fix (just need to find free time to do that).

We should be careful not to confuse reopen and reload these are different commands:

Yeah, they are doing completely different things.

  • reopen : undocumented command to reopen current buffer using bp.Buf:ReOpen() function

Indeed, and it might be a good moment to document it.

  • reload : this command reloads all runtime files (bindings.json, settings.json, init.lua, ?plugins?)

Yep, it reloads plugins too. We are considering improving its documentation: https://github.com/zyedidia/micro/pull/3240#pullrequestreview-2048442423

Confusingly, settings.json includes an option named reload:

  • "controls the reload behavior of the current buffer in case the file has changed. The available options are prompt, auto & disabled" which calls bp.Buf:ReOpen() !

The reopen command does the same thing that is done automatically if the reload option is set to prompt or auto, i.e. reloads the file. So it's not surprising that ReOpen() is called in both cases.

Yeah, it might be confusing that the reload command (not option) does a completely different thing.

scurest commented 1 month ago

Thanks for the pointer to reopen. So it looks like the error occurs in EventHandler.ApplyDiff. Here's a fragment of the diff operations performed

State of LineArray
  "569"
  "489"
  "45"

Diff to Process
  {"Equal" "9\r\n"} @ {X:2 Y:0}

State of LineArray
  "569"
  "489"
  "45"

Diff to Process
  {"Delete" "48"} @ {X:1 Y:1}      <- NOTE: should be {X:0 Y:1}

State of LineArray
  "569"
  "4"           <- NOTE: 89 deleted instead of 48
  "45"

So the error seems to be that when \r\n is encountered, the location advances two positions to the right, instead of only one. This misaligns the locations for all subsequent diff operations, which produces the garbled text.

JoeKar commented 1 month ago

So the error seems to be that when \r\n is encountered, the location advances two positions to the right, instead of only one.

Thank you for the great hint!

https://github.com/zyedidia/micro/blob/35630aa736f85f21d70bfba96a86c30ae2f4baf0/internal/util/unicode.go#L90

->

if r != '\r' && !isMark(r) {

Most probably there is a more elegant and especially generic way, since this was just shot out of my hip. :wink:

@scurest: Does the proposal work for you too?

scurest commented 1 month ago

ApplyDiff is also wrong in the presence of marks.

echo -e 'Te\u0302st' > test.txt  # COMBINING CIRCUMFLEX ACCENT
# Open in micro, shows 'Têst'
echo -e 'Test' > test.txt
# Reload in micro, still shows 'Têst'
Gavin-Holt commented 1 month ago

Hi,

ApplyDiff sounds complicated, and not strictly necessary for reopen.

Could reopen just read the whole disc copy and reposition the cursor to the last known editing location?

Kind Regards Gavin Holt

PS. I would also expect reopen to clear the undo stack.

scurest commented 1 month ago

ApplyDiff maintains the "logical" cursor position.

echo 'The bottle says DRINK ME' > test.txt
# Open in micro, 'The bottle says DRINK |ME' (| denotes cursor location)
echo 'The cake says EAT ME' > test.txt
# Reload in micro, 'The cake says EAT |ME' (cursor maintained at ME)

Is this the only reason it's used? I would also be fine giving this up for correct reopen.

Also the line-ending mode is not re-detected when reopening.

JoeKar commented 1 month ago

Is this the only reason it's used?

No. The main purpose is to handle every insertion and/or deletion independent to track it inside the undo buffer. Furthermore the modifications can be visualized with the diffgutter. The cursor position is handled here: https://github.com/zyedidia/micro/blob/e9bd1b35f4ee62352a361fdec94684accd6f4874/internal/buffer/buffer.go#L560

So bypassing or removing ApplyDiff is no option so far. It just need to work correctly in these scenarios.

Currently you broke it down to some minimal reproduction cases, which is good so far. Just image the use case where you've a much larger file which is edited outside of you current buffer...maybe you would like to see the diff and maybe you'd like to undo something.