pkg / sftp

SFTP support for the go.crypto/ssh package
BSD 2-Clause "Simplified" License
1.52k stars 380 forks source link

Bug: wrong error message returned by Client.Remove #468

Open awaken opened 3 years ago

awaken commented 3 years ago

Hi, Great job! I really appreciate your effort for maintaining this library! I mainly use your library inside a proprietary Big Data project, that is running in production since 2 years ago, using Linux VMs in the cloud, and moves many TBs of data per day.

func (c *Client) Remove(path string) error can return an error for many reasons, of course. In most cases, the library just works fine.

If Remove is called for removing just a file (and not a directory), in some cases I experienced that Remove can return an error containing this message: "Not a directory". In my opinion, it is a bug.

Looking at the Remove implementation, I see that, in case err.Code is equal to sshFxFailure or if there is a permission error, the returned error is the one got from RemoveDirectory. I think if Remove is called for removing a file, it should never return an error with a message like "Not a directory".

I would suggest this fixed implementation:

func (c *Client) Remove(path string) error {
    err := c.removeFile(path)
    // some servers, *cough* osx *cough*, return EPERM, not ENODIR.
    // serv-u returns ssh_FX_FILE_IS_A_DIRECTORY
    // EPERM is converted to os.ErrPermission so it is not a StatusError
    if err2, ok := err.(*StatusError); ok {
        switch err2.Code {
        case sshFxFailure:
            if c.RemoveDirectory(path) == nil { return nil }
            return err
        case sshFxFileIsADirectory:
            return c.RemoveDirectory(path)
        }
    }
    if os.IsPermission(err) {
        if c.RemoveDirectory(path) == nil { return nil }
    }
    return err
}

Thanks for your attention.

puellanivis commented 3 years ago

Hm… yeah, that does seem like a confusing error to be returning. Looking into the Go source code for os.Remove https://cs.opensource.google/go/go/+/refs/tags/go1.17:src/os/file_unix.go;l=314-322

🤔 Hm… curious. Not sure if your proposed change would actually fix things for all cases, or just for this one case. I’ll be on a plane flying back to Germany in 24-hours, so I’ll ponder on things and try and get something going when I get back.

Alternatively, I’m supposed to be working on reimplementing https://github.com/pkg/sftp/pull/432 (there were so many merge conflicts rebasing the PR, I’ve moved on to reimplementing) and this seems like something that would be good to address in that larger refactor, rather than giving too much priority into a confusing error message. 🤔

P.S.: Do you happen to know which server implementation is on the other end when you’re getting this error message? Ideally, we could address this a little better, if we knew what was generating the error.

awaken commented 3 years ago

Hi, that's a server from one of our clients. I can only provide the motd: SSH-2.0-mod_sftp/0.9.9

awaken commented 3 years ago

I forgot to mention that the "Not a directory" error messages (that I experienced) is always of SSH_FX_FAILURE kind.

drakkan commented 3 years ago

Hi, that's a server from one of our clients. I can only provide the motd: SSH-2.0-mod_sftp/0.9.9

This is ProFTPD

awaken commented 3 years ago

Hm… yeah, that does seem like a confusing error to be returning. Looking into the Go source code for os.Remove https://cs.opensource.google/go/go/+/refs/tags/go1.17:src/os/file_unix.go;l=314-322

🤔 Hm… curious. Not sure if your proposed change would actually fix things for all cases, or just for this one case. I’ll be on a plane flying back to Germany in 24-hours, so I’ll ponder on things and try and get something going when I get back.

Let's say some SFTP user can properly read a file, but he doesn't have any writing permission. In this case, any Client.Remove attempt will produce a "Not a directory" error.

In my opinion, it is more than just a wrong message, because the error is claiming an incorrect reason about the removal failure. Right now, my application can notify wrong failure reasons.

It seems to me that my proposed fix acts just like os.Remove, about the logic for returning the error. If the file removal attempt fails because the file is a directory, then return the error obtained from the directory removal attempt, otherwise just return the error from the file removal attempt. That's the idea, right?

puellanivis commented 3 years ago

The problem here is that the os.Remove only returns the syscall.Unlink error if the syscall.Rmdir is ENOTDIR. For instance, if the filesystem object exists, is a directory, and is not empty, os.Remove will return “directory not empty” not “is a directory”.

The change you have proposed would return “is a directory” for a non-empty directory, with an error message that is now just as confusing as returning not a directory for a file in a directory without write permission. (Note: a user without write permission to a file can unlink() that file as long as they have write permission to the directory containing it. This is a known security loophole, where someone can copy a file, and recreate it with write permissions (thus escalating permission access to that file) as long as they have write permission to the directory itself.)

puellanivis commented 3 years ago

So, I’m starting to think that matching the implementation used by os.Remove is the cause of this issue. Thinking about why os.Remove exists as both file and directory deletion, I figured it might be because Plan 9 only offers a single syscall that defines this behavior, and looking up the source code. Yes, this is absolutely the case. Getting Plan 9 to implement a POSIX unlink and rmdir would have taken a lot more effort than simply combining both for the POSIX implementations.

The windows implementation does a Stat on the file if both failed, which it then uses to figure out if it is a directory, so return RemoveDirectory error; or a file, so return the DeleteFile error.

We could potentially do something similar, but this would be another roundtrip request. Although, this would save us from having to parse out the error string to figure out which ones we shouldn’t return. (i.e. something like returning the SSH_FXP_REMOVE error unless the SSH_FXP_RMDIR error is not the string “not a directory”, which is obviously fragile, since it might be something like ”ist kein Verzeichnis” or other such localized string.)

We should be able to test for the possibility of a SSH_FX_NOT_A_DIRECTORY and if the error is not that, then we Stat the file to determine if directory or file, (and as a fail-safe, return the SSH_FXP_REMOVE response if that Stat fails).

I’ll work this into my client updates, when I get into it again.

awaken commented 3 years ago

does a Stat on the file if both failed, which it then uses to figure out if it is a directory, so return RemoveDirectory error; or a file, so return the DeleteFile error.

This seems a good and easy solution!