rancher-sandbox / rancher-desktop

Container Management and Kubernetes on the Desktop
https://rancherdesktop.io
Apache License 2.0
5.8k stars 272 forks source link

Rancher Desktop 1.14.2 wiped my .zshrc file. #7154

Closed conor-mccullough closed 1 month ago

conor-mccullough commented 1 month ago

Actual Behavior

Unfortunately it seems this recurring bug is still around. I just lost my .zhrc and .bashrc at a minimum.

Didn't have a recent backup handy, my bad, but I had been installing Rancher Desktop to test as a replacement for Docker Desktop and the fact that this appears to be something the devs aren't able to fix on a permanent basis has been enough to put me off using it in the future.

Steps to Reproduce

N/A

Result

N/A

Expected Behavior

I shouldn't need to worry about losing my entire shell config when toying around with resets/etc using Rancher Desktop.

Additional Information

No response

Rancher Desktop Version

1.14.2

Rancher Desktop K8s Version

1.30.2

Which container engine are you using?

moby (docker cli)

What operating system are you using?

macOS

Operating System / Build Version

MacOS Sonoma, M3 Pro

What CPU architecture are you using?

arm64 (Apple Silicon)

Linux only: what package format did you use to install Rancher Desktop?

None

Windows User Only

No response

jandubois commented 1 month ago

Unfortunately it seems this recurring bug is still around. I just lost my .zhrc and .bashrc at a minimum.

Didn't have a recent backup handy, my bad, but I had been installing Rancher Desktop to test as a replacement for Docker Desktop and the fact that this appears to be something the devs aren't able to fix on a permanent basis has been enough to put me off using it in the future.

I'm really sorry to hear that. This issue was supposed to be fixed in 1.12.2, and this is the first report that this has happened with any later release.

Do you have any information on how to reproduce this? Is this reproducible on your machine?

conor-mccullough commented 1 month ago

I only noticed it after around 3 hours of messing with Rancher Desktop - a series of factory resets, reinstalls, playing around with attempting to embed a CA, and some other stuff. My active terminal window lost its zshrc settings but I thought maybe it was just unloaded from that terminal session somehow so kept working and can't recall exactly what triggered it.

Currently I don't have the time to reliably reproduce this, I'm raising this here in hopes that it can serve as a point of reference in case others run into the same thing.

It replaced the contents with the following (Still available in my .bash_profile), if this helps narrow it down:

▶ cat ~/.bash_profile
### MANAGED BY RANCHER DESKTOP START (DO NOT EDIT)
export PATH="/Users/<user>/.rd/bin:$PATH"
### MANAGED BY RANCHER DESKTOP END (DO NOT EDIT)
amandeep-sharma commented 1 month ago

I also faced the same issue on Mac M2. It can be reproduced with below steps

mook-as commented 1 month ago

Thank you @amandeep-sharma — with those steps I can (sometimes) reproduce.

I think it was easier for me to reproduce because that error case made shutting down faster, so it triggered the actual problem:

I think the most important change here is to avoid doing the write when we're starting to shut down. Also, we should look into doing the write atomically; I think that involves (at least on Linux) using file renames, and therefore bailing if the target file is a symlink or other non-normal files.

jandubois commented 1 month ago

I think the most important change here is to avoid doing the write when we're starting to shut down.

Yes, and there needs to be some kind of mutex that prevents starting a shutdown while the write is in progress.

Also, we should look into doing the write atomically; I think that involves (at least on Linux) using file renames, and therefore bailing if the target file is a symlink or other non-normal files.

You could still fall back to the file copy when the rename is not possible. The important part is to keep a backup in place that will not be removed/overwritten even by repeated failed attempts to overwrite the original.

mook-as commented 1 month ago

Posted #7195 to at least not do the write on shutdown.

I think we should do a bit extra to help with the data loss, though; I'm thinking of changing the existing logic to something like:

(Edited to reflect the next four comments)

  1. If the target does not exist:
    1. If the desired state is true, write the contents into file (creating it).
    2. Return.
  2. If the target is a normal file:
    1. If the file has xattrs (doing a platform-specific check), bail.
    2. Copy target file (with COPYFILE_FICLONE) to <file>.rd-temp
    3. Read the contents of the file we just copied.
    4. Split the contents into before, current, after (i.e. the existing code).
    5. Set desired to the desired lines ([] if the desired state is false).
    6. If current is desired, return (since no change is needed).
    7. If before + desired + after is empty:
      1. Delete the target file.
      2. Delete the temporary file.
      3. Return.
    8. Write the before + desired + after content into the temporary file.
    9. Rename the temporary file into the target file.
    10. Return.
      1. File permissions should already be correct.
  3. If the target is a symlink:
    1. Copy target file (with COPYFILE_FICLONE + COPYFILE_EXCL) to <file>.rd-backup~
      1. This would throw an error if the symlink is dangling.
      2. This would also throw if the backup file already exists.
    2. Read the file (through the symlink).
    3. Split the contents into before, current, after (i.e. the existing code).
    4. Set desired to the desired lines ([] if the desired state is false).
    5. If current is desired, return (since no change is needed).
    6. Write the before + desired + after content into the file (through the symlink, even if it's empty).
    7. Read the file back out; if it does not match, throw an error.
      1. This would not remove <file>.rd-backup~
    8. Remove <file>.rd-backup~
  4. If the target is neither a normal file nor a symlink:
    1. Return with an error.

The main changes are:

Thoughts?**

jandubois commented 1 month ago
  1. If the existing file is not a plain file (i.e. it's a symlink or something), throw an error.

That is not really acceptable; it is a common use case[^1] to have config files in a Git repo, or shared cloud folder, and use symlinks to point to them.

[^1]: Of course "common use case" is also a euphemism for "that's what I'm doing". 😛

I think you can still follow the rest of your plan, except steps 8 and 9 need to be different when <file> is not a regular file:

  1. Write the new content to <file>.
  2. Verify that <file> contains the new content and throw an error if not.

You could also add an extra step

3.5 Verify that .<file>~rd-backup is identical to <file> and throw an error if not.

Another thing to add could be

8.5 Call sync to make sure everything is written back to disk.

I think step 3.5 and the alternate step 9 are paranoia, as long as the return codes of the write and close syscalls indicate no error, but it doesn't hurt to do a belt-and-suspenders thing here.

mook-as commented 1 month ago

I think 3.5 and sync are overkill; we're not attempting to be safe from the system crashing, just Rancher Desktop crashing. To rephrase: if the system crashes, the user is aware that things have went wrong.

I guess I can accept doing the alternate code path (not using a swap file) if the target is a symlink, though.

jandubois commented 1 month ago

It is a bit early to start bike-shedding about the filenames, but whatever, I'm already here:

.<file>~rd-backup

Given that <file> already starts with a ., I think the extra dot here is not needed. I would also prefer to use a proper file extension by using . instead of a ~, so:

<file>.rd-backup

If you really want to include the ~ to indicate "backup" (which is already in the name), then append it at the very end: <file>.rd-backup~.

This also is more in line with your .<file>.rd-swp filename, which I would call

<file>.rd-temp

The names should make sense to the user, who is already distressed about the original file missing/destroyed. They are unaware of the implementation details, so swap doesn't really make sense to them at this time.

mook-as commented 1 month ago

Oh, yeah, I don't actually plan to add a . in front, I dunno why I called it that. It's just going to be .bashrc~rd-backup or something. I can live with .bashrc.rd-backup~ though. Same with .bashrc.rd-temp.

mook-as commented 1 month ago

(Note that https://github.com/rancher-sandbox/rancher-desktop/issues/7154#issuecomment-2231957512 has been updated)

Hmm, while writing a bunch of tests for this (haven't started on the implementation yet), I think there are some corner cases to think about:

jandubois commented 1 month ago
  • If the target file is a symlink, the target only contains managed lines, and we want to remove those lines: do we remove the symlink (leaving the symlink target behind), remove the symlink target (leaving a dangling symlink), or remove both? Or leave an empty file at the target?

I think we should leave the empty file.

Another question though: if the symlink is dangling from the start, should we create the file? I would think it is better to throw an error, the same was as when the backup file already exists.

  • When we're dealing with plain files (not symlinks), do we still need the <file>.rd-backup~, given that we use <file>.rd-temp and rename into place?

Probably not. Theoretically it guards against the editing producing a broken new version of the file by keeping a backup. But since we are going to delete it once the rename succeeds, there isn't much of a point.

  • When doing plain files, do we need to do anything to preserve permissions and xattrs? How would we go about doing that? (Ideally we'd use clonefile(2) to make the temporary file first then overwriting its contents, but it's unclear if that's possible in NodeJS.)

On Linux you can use getfattr/setfattr and getfacl/setfacl.

On macOS it will require some research; you can probably use xattr, and some combination of ls -le and chmod.

I think it will be rather unusual to have xattrs and/or acls on the config files, and we could just disallow using "Automatic" path management in that case.


Another thing I just thought about: we need to make sure the code works correctly if e.g. ~/.bash_profile and ~/.profile are symlinks to the same target. I think it works correctly right now; just make sure there is no race.

mook-as commented 1 month ago

I think it will be rather unusual to have xattrs and/or acls on the config files, and we could just disallow using "Automatic" path management in that case.

That means we will have to detect that there are things on the config files. That's going to be… annoying. On the bright side, though, I missed that fs.promises.copyFile has a COPYFILE_FICLONE option, so we do have clonefile(). So I think we can create the temp file by using that, then clobbering its contents, then rename on top of the original.


This is incorrect: per the source, that just uses ioctl(…, FICLONE), so it has no effect on the (extended) attributes… https://github.com/libuv/libuv/blob/v1.48.0/src/unix/fs.c#L1227

jandubois commented 1 month ago

That means we will have to detect that there are things on the config files.

You can still do that with the commands I gave above.

Or find a Node module, e.g. https://github.com/LinusU/fs-xattr for attributes; didn't find anything quickly for ACLs.

we do have clonefile()

That is fine for macOS, but I wonder what it does on Linux. On btrfs and xfs it could to a reflink copy (don't know if it does).

I think this is too much specialized logic for an edge case. Just more code that can have bugs.

So I still think disallowing changes to path management when something unusual happens is the safest way:

In that case disable the path management checkboxes in the dialog and add a hover tip explaining the reason behind it. And force the setting in the app to "Manual" regardless what the actual file content is.

mook-as commented 1 month ago

Updated https://github.com/rancher-sandbox/rancher-desktop/issues/7154#issuecomment-2231957512 again.

So fs.promises.copyFile only preserves file permissions, and not xattrs; so we'll need to do some platform-specific code to handle xattrs (shelling out…). That, or we can, as suggested, just fail if the target file has any xattrs (including the Apple quarantine bit).

(I'm avoiding fs-xattr because that requires node-gyp again, and we made so much effort to try to get rid of it…)