pkg / sftp

SFTP support for the go.crypto/ssh package
BSD 2-Clause "Simplified" License
1.51k stars 380 forks source link

custom interface fails to trigger concurrent reads in `ReadFrom` #587

Closed emar-kar closed 4 months ago

emar-kar commented 4 months ago

Hello there.

I have a project, where I need to upload different files via sftp and I use concurrent reads/writes with config:

sftpCl, err := sftp.NewClient(
    sshConn,
    sftp.UseConcurrentReads(true),
    sftp.UseConcurrentWrites(true),
    sftp.MaxConcurrentRequestsPerFile(64),
    sftp.MaxPacket(32768), // Max possible number per packet.
)

In addition, I need graceful shutdown thus, I wrapped my file reader with context:

type ReaderWithContext struct {
    ctx context.Context
    r   io.Reader
}

func (r *ReaderWithContext) Read(p []byte) (int, error) {
    if err := r.ctx.Err(); err != nil {
        return 0, err
    }

    return r.r.Read(p)
}

But I noticed a huge regression in speed, where my test data ~15MB was uploading >1 min compare to ~12 sec without custom interface. After debugging, I found out that the actual problem was inside the ReadFrom function, since it defines remain but only if io.Reader implements Len(), Size(), Stat() or it is an io.LimitedReader. In other cases it starts read/write without concurrency. I assume it is the way to avoid situations when remain is 0 and signalise of io.EOF or reader errors. So, we perform single Read, to define this and not to start all the routines for concurrent read/writes.

I can solve my situation with explicit call to ReadFromWithConcurrency or wrap my context reader with io.LimitedReader. But maybe there can be an addition to ReadFrom func docs, which explain/warn about this specific behaviour? Or cover situation when remain is 0 with an attempt to call ReadFromWithConcurrency anyway?

puellanivis commented 4 months ago

We have had corner cases where we just want to be extra safe that we don’t use concurrency if we cannot know that the operations are safe.

However, I’m unsure if how much—if any—these concerns still exist since we reworked how concurrency works. Now, I think we’re just trying to attempt avoiding trying to spin off a ton of goroutines just for them to not pick up any work.

emar-kar commented 4 months ago

@puellanivis Thank you for the quick response.

we’re just trying to attempt avoiding trying to spin off a ton of goroutines just for them to not pick up any work

Yeah, that was my guess.

...we don’t use concurrency if we cannot know that the operations are safe.

Maybe it can be included in the docs for ReadFrom, to make it clear without looking through the sources?

UPD: or recommend to use ReadFromWithConcurrency with 0 arg for concurrency to apply default one from the client options.

puellanivis commented 4 months ago

I’m not opposed to either or both changes. If you have a good idea for the message you could open a PR with a suggestion. Otherwise, I’ll look into getting to it.

puellanivis commented 4 months ago

Great job, thanks!