zyedidia / micro

A modern and intuitive terminal-based text editor
https://micro-editor.github.io
MIT License
24.42k stars 1.16k forks source link

Micro saves files non-atomically #3148

Open dmaluka opened 4 months ago

dmaluka commented 4 months ago

When saving a file, micro simply overwrites it with the new contents. So if writing is interrupted in the middle due to some error (e.g. running out of disk space, unplugging USB stick and so on), the file is left in a damaged state: partially written or just empty (i.e. the user's data is lost).

To prevent this, micro should save files atomically: first write to a temporary file, and then, if write completes successfully, replace the user's file with this temporary file. AFAIK this is how it is done by other editors.

This issue concerns not just the edited files themselves, but also other files that micro may modify: settings.json, bindings.json, backups and so on.

Commit hash: 59dda01c OS: any Terminal: any

dmaluka commented 4 months ago

This issue concerns not just the edited files themselves, but also other files that micro may modify: settings.json, bindings.json, backups and so on.

This actually happened to me today in real life: I started fetching a huge repo in the background, and went on using micro. The fetch eventually ate up all the disk space, and when I started micro yet another time, it showed me Error reading settings.json: unexpected end of JSON input error, since my settings.json was empty; and the command history was empty too. Luckily the actual files I was working with using micro weren't damaged, since I wasn't really modifying them.

dmaluka commented 4 months ago

I see other people have also already encountered this problem: https://github.com/zyedidia/micro/issues/1916

JoeKar commented 4 months ago

So like in the most of the editors overwriteFile() should write to name + e.g. ".tmp" and then use os.Rename() to move?

dmaluka commented 4 months ago

More or less. We might need to figure out how to do it fully reliably (or as reliably as possible), e.g. what if the file is a symlink or a bind mount, and so on. And what about Windows (I heard it's trickier to ensure atomicity there).

JoeKar commented 4 months ago

Had already an look into the os interfaces, but unfortunately it doesn't provide that capability (like copy-on-write and move afterwards) with that abstraction already. And yes, Windows could become tricky as well as the integration into the present su-approach. So from current perspective it looks like reinventing the wheel again. :(

dmaluka commented 4 months ago

There are some 3rd-party packages e.g. renameio but I haven't looked into that deeply.

JoeKar commented 4 months ago

There are some 3rd-party packages e.g. renameio but I haven't looked into that deeply.

A naive approach to the problem is to create a temporary file followed by a call to os.Rename().

...and so I was blamed again. :smile: But it sounds worth to it add it as a new dependency, because we then don't need to take care of these corner cases, because someone already did and implemented it for general purpose.

As it is not possible to provide a correct implementation, this package does not export any functions on Windows.

On Windows we can still run into trouble. Due to writefile.go#L15-L16 it looks like we can use this interface under POSIX only.

X-Ryl669 commented 3 months ago

Seems like any solution (even the naïve solution with os.rename) would be better than the current situation where a corruption is guaranteed to occur.

dmaluka commented 3 months ago

Yes and no. If for example the user's file is a symlink, and we just rename() it with our temporary file, the user's file silently becomes a regular file instead of a symlink. Which is arguably even worse than the current situation, since it will happen always, not just when an I/O failure occurs.

X-Ryl669 commented 3 months ago

Honestly, I don't know if it's a Go limitation or not. On Posix, you're safe to use renameat that will do exactly what you expect since you provide opened file descriptor for the rename operation, thus you've already followed the link to get the file descriptor.

If it's not available everywhere, I guess a readlink is to ensure the target of the link is selected for the new parameter of rename (but it suffers from TOCTOU issues as usual)

X-Ryl669 commented 3 months ago

Also, I don't agree with the idea that leaving a temporary file upon failure is bad. I think, on the opposite that it's a welcome feature. If I edit a file other a SSH session and the network suddenly goes down or a power outage, I'd be a lot more happy to find a .tmp file with the current edited buffer than nothing (like if you used O_TMPFILE).

dmaluka commented 3 months ago

Also, I don't agree with the idea that leaving a temporary file upon failure is bad.

Sure, we should keep the temporary file upon a failure, since it contains user data which would be lost otherwise. But I was referring to a different case. I was referring to a case when there is no failure, and as a result, the user's symlink is silently overwritten with a regular file (although the contents of this file are correct). So the user now suddenly has two version of the file (with the old contents and with the new contents) instead of two references to the same file (with the new contents), and has no idea about it.

On Posix, you're safe to use renameat that will do exactly what you expect since you provide opened file descriptor for the rename operation, thus you've already followed the link to get the file descriptor.

I don't think this is true. The descriptor renameat() is provided with is the descriptor of the directory where the file is, not the descriptor of the file itself.

For what it's worth:

~ % echo aaa >test1
~ % ln -s test1 test2
~ % ls -l test1 test2
-rw-r--r-- 1 mitya mitya 4 Mar 14 12:22 test1
lrwxrwxrwx 1 mitya mitya 5 Mar 14 12:22 test2 -> test1
~ % cat test2
aaa
~ % echo bbb >test3
~ % mv test3 test2
~ % ls -l test1 test2
-rw-r--r-- 1 mitya mitya 4 Mar 14 12:22 test1
-rw-r--r-- 1 mitya mitya 4 Mar 14 12:23 test2
~ % cat test2
bbb
~ % cat test1
aaa

and the strace of mv (the gist of it):

renameat2(AT_FDCWD, "test3", AT_FDCWD, "test2", RENAME_NOREPLACE) = -1 EEXIST (File exists)
stat("test2", {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
newfstatat(AT_FDCWD, "test3", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "test2", {st_mode=S_IFLNK|0777, st_size=5, ...}, AT_SYMLINK_NOFOLLOW) = 0
rename("test3", "test2")                = 0
dmaluka commented 3 months ago

BTW it should be worth looking how Vim handles the issue when saving files (and analyzing whether what Vim is doing is reasonable). At first glance it looks like it reuses the backup file for that (or rather, what vim calls the "backup file" is only used when saving the file): vim first tries to overwrite the target file with the contents of the backup file, and if that fails, it resorts to rename()ing the backup file to the target file.

X-Ryl669 commented 3 months ago

You're right. I've confused linkat with renameat. You have a AT_FOLLOW_LINKS or something like this but it's following the old symlink not the new one. There isn't any function to do that atomically on linux.

So you're back to the basic option of doing a readlink on the destination (if it's a link) and update the path accordingly.

That's what vim is doing