pkg / sftp

SFTP support for the go.crypto/ssh package
BSD 2-Clause "Simplified" License
1.5k stars 379 forks source link

function `WriteTo` return 'file does not exist' #510

Open shimeng930 opened 2 years ago

shimeng930 commented 2 years ago

`if f, err = s.client.Open(path.Join(remotePath, fileName)); err != nil { return err }

if err = os.MkdirAll(localPath, os.ModePerm); err != nil {
    return err
}
if localFile, err = os.Create(path.Join(localPath, fileName)); err != nil {
    return err
}
if n, err := f.WriteTo(localFile); err != nil {

return err } `

open file is ok, and i am sure that sftp server has the file but writeTo calling will return a err, err content is file does not exist

shimeng930 commented 2 years ago

whether i use fstat or stat , i get the error message is pThe message [path/filename] is not extractable

puellanivis commented 2 years ago

Can you use fmt.Errorf("<unique prefix per return>: %w", err) to ensure where the error is isolated?

These errors are also not coming from this package, but the server. You may be attempting to do something incompatible with the way the server works.

shimeng930 commented 2 years ago

yep, err pThe message [path/filename] is not extractable is from server. and sftp pkg will transfer this error. anyway, I asked the 3pp server on the other hand, here want to know if anyone met this kind of issue and know what config should i set in code to fix it?

puellanivis commented 2 years ago

The other possibility is that the file is being deleted before the transfer. The WriteTo can sometimes trigger an attempt to determine the side of the file, which can make some remote systems think that the file has been completely read, for which, it then deletes the file afterwards. Check if https://pkg.go.dev/github.com/pkg/sftp#UseFstat helps you.

If it does (or doesn’t), another option is to implement the io.WriteTo yourself with a simple Read into a buffer and Write. It won’t be particularly fast, as each operation will need to round-trip before the next Read is issued, but it has a higher compatibility with these sorts of corner cases.

sogyals429 commented 2 years ago

Hi I also seem to be facing the same issue. But oddly for me if I connect to the sftp using the cli I can download the file locally without any issues. One thing I noticed was if I download and upload the same file back to the sftp the package works completely fine. P.S there are no permission changed on the file.

puellanivis commented 2 years ago

As I noted, IBM Sterling SFTP servers (and some other ”high-security” SFTP server configurations) are known to have compatibility issues with us trying to figure out how big a file is before downloading it concurrently.

I’m starting to think that maybe we should build a “WriteTo” alternative which is built for maximum compatibility. 🤔

shimeng930 commented 2 years ago

The other possibility is that the file is being deleted before the transfer. The WriteTo can sometimes trigger an attempt to determine the side of the file, which can make some remote systems think that the file has been completely read, for which, it then deletes the file afterwards. Check if https://pkg.go.dev/github.com/pkg/sftp#UseFstat helps you.

If it does (or doesn’t), another option is to implement the io.WriteTo yourself with a simple Read into a buffer and Write. It won’t be particularly fast, as each operation will need to round-trip before the next Read is issued, but it has a higher compatibility with these sorts of corner cases.

yep.

  1. first useFstat is not useful
  2. i use buf read and write and download ok
shimeng930 commented 2 years ago

As I noted, IBM Sterling SFTP servers (and some other ”high-security” SFTP server configurations) are known to have compatibility issues with us trying to figure out how big a file is before downloading it concurrently.

I’m starting to think that maybe we should build a “WriteTo” alternative which is built for maximum compatibility. 🤔

i think a “WriteTo” alternative is necessary

puellanivis commented 2 years ago

i think a “WriteTo” alternative is necessary

I mean, remember this won’t be automatically used by anything, like io.Copy so it’s not actually going to solve many of the most common issues. I’ve been putting off doing the client rewrite, but I think this is something I’ll want to keep in mind… that we might want a much more compatible WriteTo process, even if it means we over provision goroutines for smaller files…

theGuen commented 2 years ago

Hi, i guess i had the same Problem with the disappearing files. There are like 20 files with the same name... one for every day. The fstat always deletes the newest... and you read an older one until you have only one left... a total mess...

I have changed the WriteTo to preserve the concurrency. More of a ugly solution... but seems to work for my case...

func (f *File) WriteTo(w io.Writer, fileInfo os.FileInfo) (written int64, err error) {
    f.mu.Lock()
    defer f.mu.Unlock()

    if f.c.disableConcurrentReads {
        return f.writeToSequential(w)
    }

    // For concurrency, we want to guess how many concurrent workers we should use.
    fileSize := uint64(fileInfo.Size())
    mode := uint32(fileInfo.Mode())

    if fileSize <= uint64(f.c.maxPacket) || !isRegular(mode) {
...

I take the fileInfo from ReadDir

fis, err := sftpClient.ReadDir("/Out")
    var fileInfo os.FileInfo
    for i, v := range fis {
        fmt.Println("LIST", i, v.Name(), v.ModTime(), v.Size())
                // filter the right fileName
                // in my case there are 20 files with the same name ???
        fileInfo = v
    }
...
puellanivis commented 2 years ago

The WriteTo function is an interface that must be complied with: https://pkg.go.dev/io@go1.18.3#WriterTo So we cannot change the function signature.

However, in my planned client re-rewrite, I’m think I should just ditch the whole “guess the best number of concurrent requests” entirely. Even if this causes small-file inefficiencies it will be less of a pain than the regular occurrence of people’s frustrations at these only-read-once-for-security designs.

theGuen commented 2 years ago

Ok that's right... As i said just a quick hack to make things work with this stupid server... The concurrency should stay... I just tested again with

sftp.UseConcurrentReads(true)
➜  testSFTP time go run testftp.go
go run testftp.go  0,39s user 0,53s system 19% cpu 4,613 total

sftp.UseConcurrentReads(false)
➜  testSFTP time go run testftp.go
go run testftp.go  0,46s user 0,66s system 4% cpu 24,885 total

And that was just a 18M file

Maybe something like

// returns the first matched fileInfo 
func (c *Client) ReadDirWithFilter(path string, fileFilter func(string) bool)(fi fs.FileInfo, found bool)
and 
func (c *Client) OpenWithFileInfo(path string, fileInfo fs.FileInfo) (*File, error)
with a File like 
// File represents a remote file.
type File struct {
    c      *Client
    path   string
    handle string

    mu     sync.Mutex
    offset int64 // current offset within remote file

        fromFileInfo bool // indicates that that the size is known
        size uint64
        mode fs.FileMode
}

path should maybe be the path without filename for the functions... I guess this would be optional to use and breaks nothing

theGuen commented 2 years ago

Just in case of any misunderstanding ... With "this stupid server" i was referring to this or maybe that server which deletes the files on a fstat and has the same filename more than once.... so if you missed a file you have to read first the newest than the older one to import in reverse order... I would never have opened an issue here to deal with a server like that...

This library is just great and served me well over the last year. Thanks for that!

Since i have to deal with that server i have now implemented the FilterDir and OpenWithFileInfo in a minimal invasive way. If you like i can create a pull request... On the other hand if you are rewriting the client it's maybe not needed...

puellanivis commented 2 years ago

Oh, yeah, I totally understood what you were referring to, and yeah, these servers are typically poorly designed to integrate with typical engineering practice… I was wanting to put in “security theater” but decided to avoid it. 😆

I’d probably say, I would prefer us to not pick up FilterDir, as with a proper look towards the future, we should target compatibility with io/fs.WalkDir.

If we add an OpenWithFileInfo exported function, then we’re going to be stuck with it indefinitely. We already have a complete duplicate set of exported error names, because we’re kind of stuck with what we’ve provided before. The central idea of “do not introduce any breaking changes without a major version number increase”.