sxyazi / yazi

💥 Blazing fast terminal file manager written in Rust, based on async I/O.
https://yazi-rs.github.io
MIT License
16.65k stars 388 forks source link

Copy tasks between some different file-systems (remote) hang with `(os error 95)` #1775

Closed ppenguin closed 1 month ago

ppenguin commented 1 month ago

What system are you running Yazi on?

Linux Wayland

What terminal are you running Yazi in?

kitty

yazi --debug output

Yazi
    Version: 0.3.3 (Nixpkgs 2024-09-04)
    Debug  : false
    OS     : linux-x86_64 (unix)

Ya
    Version: 0.3.3 (Nixpkgs 2024-09-04)

Emulator
    Emulator.via_env: ("xterm-kitty", "")
    Emulator.via_csi: Ok(Kitty)
    Emulator.detect : Kitty

Adapter
    Adapter.matches: Kitty

Desktop
    XDG_SESSION_TYPE           : Some("wayland")
    WAYLAND_DISPLAY            : Some("wayland-1")
    DISPLAY                    : Some(":0")
    SWAYSOCK                   : None
    HYPRLAND_INSTANCE_SIGNATURE: Some("0f594732b063a90d44df8c5d402d658f27471dfe_1728814821_51066977")
    WAYFIRE_SOCKET             : None

SSH
    shared.in_ssh_connection: false

WSL
    WSL: false

Variables
    SHELL              : Some("/run/current-system/sw/bin/zsh")
    EDITOR             : Some("nvim")
    VISUAL             : None
    YAZI_FILE_ONE      : None
    YAZI_CONFIG_HOME   : None

Text Opener
    default: Some(Opener { run: "${EDITOR:-vi} \"$@\"", block: true, orphan: false, desc: "$EDITOR", for_: None, spread: true })
    block  : Some(Opener { run: "${EDITOR:-vi} \"$@\"", block: true, orphan: false, desc: "$EDITOR", for_: None, spread: true })

Multiplexers
    TMUX               : false
    tmux version       : tmux 3.5a
    ZELLIJ_SESSION_NAME: None
    Zellij version     : No such file or directory (os error 2)

Dependencies
    file             : 5.45
    ueberzugpp       : No such file or directory (os error 2)
    ffmpegthumbnailer: 2.2.2
    magick           : 7.1.1-39
    fzf              : 0.55.0
    fd               : 10.2.0
    rg               : 14.1.1
    chafa            : 1.14.4
    zoxide           : 0.9.6
    7z               : 17.05
    7zz              : 24.08
    jq               : 1.7.1

Did you try the latest nightly build to see if the problem got fixed?

No, and I'll explain why below

Describe the bug

I'm copying files from a mounted ntfs3 volume (external USB HD) to a gvfs mounted samba share (i.e. accessed through /run/user/1000/gvfs/...).

I get many Failed to work on this task: Operation not supported (os error 95) errors, and all tasks with the same source/destination hang. No further information is given.

It is not unusual to get some errors messages when copying between file-systems, these almost always refer to the inability to copy some kind of extended attribute, however, this type of error is/should be always treated non-fatally (more like a warning), this is the behaviour of all other known software (at least on linux).

Therefore, "not trying harder" to just copy the files and hanging instead should be considered a bug.

Expectation: issues regarding copying of extended attributes are reported as warnings, but the copying tasks are completed if no "real" errors (i.e. no write premission, disk full, ...) occur.

Minimal reproducer

Not sure, probably copying files with some kind of extended attributes to a gvfs mounted remote samba share?

Anything else?

No response

sxyazi commented 1 month ago

Seems a dup of https://github.com/sxyazi/yazi/issues/1703

Could you try this comment https://github.com/sxyazi/yazi/issues/1703#issuecomment-2381377074?

ppenguin commented 1 month ago

Seems a dup of #1703

Could you try this comment #1703 (comment)?

Thanks a lot for the quick reply.

sxyazi commented 1 month ago

I'm currently not really able to spin up a rust file copy demonstrator to try bare rust functions like std::xxx... But maybe you know of a cli cp program that uses the same under the hood?

You can try Joshuto which it's also a file manager built with Rust

AidanJHMurphy commented 1 month ago

I've run into (what I think is) the same issue. Interestingly, running a shell command within yazi using the ":" key binding to create a file (e.g. touch test1.txt or echo "testing" > test2.txt) both succeed, as does using the 'a' key binding to create a file within yazi. It's only the yank-paste ("y" and "p" keybindings) workflow that seems to cause a hang.

It doesn't look like you have to be operating between different file systems either, as I was able to reproduce this yank-pasting a file from a samba share into the same directory. Also, it looks like the operation succeeds in creating the new file, but hangs at some point after that.

I've seen other discussions elsewhere with people having similar problems with the std::fs::copy function, and the hypothesis that "NTFS does not have chmod permissions, therefore attempting to set chmod permissions on a samba share will fail."

If I find the time, I'll try to play around with different versions of copy functions in Rust using my mounted file system, and see if I can isolate the issue and possibly suggest a fix.

sxyazi commented 1 month ago

Thanks @AidanJHMurphy, this info is really helpful!

It seems like Rust's std::fs::copy() does extra work related to file metadata (permissions) and raises a hard error for safety when it can't copy that metadata, even if the target filesystem doesn't support the operation at all.

I tried io::copy() and found it similar to the system call in fs::copy() on Linux, which means Yazi can still use copy_file_range to benefit from the Copy-on-write filesystem:

// fs::copy()

statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=13, ...}) = 0
openat(AT_FDCWD, "b", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0100644) = 4
statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0
fchmod(4, 0100644)                      = 0
copy_file_range(3, NULL, 4, NULL, 1073741824, 0) = 13
copy_file_range(3, NULL, 4, NULL, 1073741824, 0) = 0
close(4)                                = 0
close(3)                                = 0
// io::copy()

statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=13, ...}) = 0
openat(AT_FDCWD, "b", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0100644) = 4
fchmod(4, 0100644)                      = 0
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=13, ...}) = 0
statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0
copy_file_range(3, NULL, 4, NULL, 1073741824, 0) = 13
copy_file_range(3, NULL, 4, NULL, 1073741824, 0) = 0
close(4)                                = 0
close(3)                                = 0

But there are also some issues with this. It's currently unclear how macOS should implement it, since macOS's io::copy() just awkwardly copies byte by byte without using fclonefileat or fcopyfile.

That said, I'm not even sure if this problem exists on macOS, as no one has reported it yet.

Although Rust's fs::copy() implementation on macOS indeed treats failing to copy metadata as a hard error, it largely depends on whether the OS returns an error when these remote devices don't support metadata copying or fails silently.

Maybe we can try fixing it for Linux first. I'll create a PR for it later.

sxyazi commented 1 month ago

Please try https://github.com/sxyazi/yazi/pull/1805

ppenguin commented 1 month ago

Please try #1805

Thanks for looking into this so quickly! It has become better in that it actually copies the files successfully, nice!

BUT: in the progressbar an error is still reported (red) and the task never finishes (even after the files have actually been copied). The errors are still reported (per file?), which is no problem by itself, but needs an error/warning distinction (see below).

So the error handling is not yet fixed.

Also, it would probably be helpful in general if in the detailed report of a selected task (i.e. w -> <select> -> [enter]) the object (file) name would be listed in the error line.

It is probably necessary to distinguish between "errors" and "warnings", so that we could arrive at a task state where a task could be "finished with errors" or "finished with warnings".

/reopen

sxyazi commented 1 month ago

Thanks for the test @ppenguin!

The errors are still reported

Can you elaborate a bit more? What error is it? Is it still the same os error 95, or has there been a change?

Are you using the latest commit https://github.com/sxyazi/yazi/commit/df78827af5881d092c102320f735e22a98ea156b (what does your yazi -V say) since I submitted a patch after that PR to fix a few edge cases. Also a video demo of the operation would be helpful!

ppenguin commented 1 month ago

Thanks for the test @ppenguin!

The errors are still reported

Can you elaborate a bit more? What error is it? Is it still the same os error 95, or has there been a change?

Yes, it's the same list of same errors 95 (it seems there's one error report for every file if I select task details)

Are you using the latest commit df78827 (what does your yazi -V say) since I submitted a patch after that PR to fix a few edge cases. Also a video demo of the operation would be helpful!

> yazi -V
Yazi 0.3.3 (24a77d8 2024-10-19)

(last commit on master as of yesterday, two newer than the fix).

A side by side test "before/after" the fix:

Before: image image (Note that the status remains at 0%, files are created but remain 0 B) image

After image

Now it gets interesting, with the same (limited) file set, everything is copied, even without errors, but some of the copied files are shown as empty, even if they're not (so a "refresh" would be necessary).

With another file set (nested dirs and more/larger files), I get this: image (note the red progress info, it gets stuck at 90%, even though everything was copied successfully => some files were non-zero length, but not fully copied!!!) image

So in other words: the number of files for this case that is successfully copied went from "zero" to "more", but doesn't reliably work.

You might be able to reproduce this behaviour yourself by mounting an ntfs formatted external harddisk and a remote smb share and then copy files from the former to the latter.

sxyazi commented 1 month ago

This is really strange, I've checked the code several times and can't find any errors, basically, Yazi is just using io::copy() on Linux instead of fs::copy():

https://github.com/sxyazi/yazi/blob/24a77d8a5f07f15d3b7d4651c9acf1e372ea5aef/yazi-shared/src/fs/fns.rs#L255-L276

My best guess is that std::io::copy() itself isn't working. If that's the case, there's really nothing more Yazi can do since this is the lowest-level API provided by Rust.

It could also be related to your environment. If possible, I suggest trying to call std::io::copy() directly in a Rust app to see if it works.

I'll mark this issue as "help wanted" and reopen it for a while to see if anyone can reproduce it on their system and debug it at the code level. Any help would be greatly appreciated! 🙏🏻

ppenguin commented 1 month ago

Just as some extra info:

if I just do cp -av between the same src/tgt I get errors like

But all files are copied completely.

This probably means that in the above code (I don't know any rust) the part where you are handling permissions in the target (unsafe { libc::fchmod(writer.as_raw_fd(), cha.perm) };) will probably also not be successful. This is probably the place where a "best effort" should be made to finish the operation and throw a warning instead of an error.

Indeed, since both operations are async (copying content is decoupled from metadata?), this may also explain how we can end up with incomplete files?

To make sure that the contents (everything else than the metadata) has been indeed copied without difference, I'm not sure whether the part on L268~269 guarantees enough, or whether we'd need more checking?

Further, inspired by this thread, one might consider to make the metadata part a bit more elaborate, basically trying to copy different attributes from the metadata separately and allow failure for them (handle as warning). You could maybe even handle this in "fall-through" fashion, first try everything and then less, based on what errors come back based on the target file system.

sxyazi commented 1 month ago

Yes, the code above runs exactly as you described, with only the io::copy on line 268 being considered an error.

The subsequent libc::fchmod (for setting permissions) and set_times (for setting file access and modification times) attempt to execute as best as they can and fail silently. This means that even if they both fail, Yazi doesn’t see it as an error.

So, I really can’t imagine what could be causing the issue besides a problem with io::copy() itself. If it is indeed io::copy(), it likely means there’s no real "effective fix" for this issue since it’s the lowest-level API available to Yazi, and in Rust internally, it’s just a wrapper around the copy_file_range [1], and sendfile [2] system call:

ppenguin commented 1 month ago

So, I really can’t imagine what could be causing the issue besides a problem with io::copy() itself. If it is indeed io::copy(), it likely means there’s no real "effective fix" for this issue since it’s the lowest-level API available to Yazi, and in Rust internally, it’s just a wrapper around the copy_file_range [1], and sendfile [2] system call:

I just took a quick look for fun and education at the source of the "original" cp command (from coreutils) and surprise: it's thousands of loc long! In other words, I'm not sure we should be looking for "the lowest level API" but rather a higher level one, that is actually capable of handling all the complex ACL etc. stuff reliably. Hope this exists?

(I'd say this is pretty much mandatory, a copy program that can't handle the same as the cp baseline is ... well, not yet finished, I'm afraid (Same with e.g. this one)

sxyazi commented 1 month ago

that is actually capable of handling all the complex ACL etc. stuff reliably. Hope this exists?

Yeah, I would love to switch to such an API if it exists and would accept a PR as long as it's reliable and implemented in Rust — I don't want to introduce a C library because that would complicate the build.

I noticed that another issue you raised with rcp describes the same problem as Yazi https://github.com/wykurz/rcp/issues/26, which likely means this issue affects all Rust programs using std::fs::copy() or std::io::copy() - I took a look at the source of rcp, and it calls tokio::fs::copy(), which is an async wrapper of std::fs::copy().

In this case, I'm going to close this issue since it's beyond what Yazi can handle and also not just specific to Yazi. I really suggest testing std::io::copy() on your system, it's pretty easy to do, if that fails as well, it's best to report it to Rust so all Rust programs can benefit, not just Yazi.

ppenguin commented 1 month ago

I noticed that another issue you raised with rcp describes the same problem as Yazi [wykurz/rcp#26]

I actually raised that to make the point that probably many/most rust based programs suffer from this

In this case, I'm going to close this issue since it's beyond what Yazi can handle and also not just specific to Yazi. I really

I understand, but this is a huge thing IMO, which would warrant a big red sticker on the product saying it's severly limited and only works on a limited set of file systems.

suggest testing std::io::copy() on your system, it's pretty easy to do, if that fails as well, it's best to report it to Rust so all

Basically I think we have already proven that both io::copy and fs::copy just can't handle this, so to test this again may be unnecessary. But I'll see what I can do when I happen to be bored :stuck_out_tongue_winking_eye: (don't hold your breath though)

sxyazi commented 1 month ago

I just read the tokio source and noticed that its tokio::io::copy() isn't just a simple wrapper of std::io::copy(), but rather it implements its own userspace read-write loop.

I'm not sure if this is causing the issue, but I replaced it with std::io::copy() in https://github.com/sxyazi/yazi/pull/1817. Please give it a try; this is my last-ditch effort on this problem, hope it makes a difference 🤷🏻

ppenguin commented 4 weeks ago

Yes, I can confirm that #1817 works for the above test case! This is a keeper :+1: Thanks for your persistent efforts!