pkg / sftp

SFTP support for the go.crypto/ssh package
BSD 2-Clause "Simplified" License
1.49k stars 379 forks source link

Server: Provide sequential read/write API #586

Closed klauspost closed 1 month ago

klauspost commented 1 month ago

The FileReader and FileWriter requires and uses the WriterAt and ReaderAt respectively.

While this in rare cases may be preferential, it will mostly be slower and sometimes a lot slower.

In MinIO we see orders of magnitude slower transfer speeds since we have to read each 32K request separately, leading to excessive seeking. For writes we added a buffer, to make writes sequential.

I have something of a dislike for the *At interfaces and I ask you to consider adding a pure io.Reader and io.Writer interface, since that would make implementations a lot simpler for all cases where you aren't serving files from disk.

puellanivis commented 1 month ago

I understand your situation where for you ReadAt and WriterAt create complications, unfortunately, there’s no way to guarantee that the connecting client will actually make only sequential reads/writes, meaning the ReaderAt and WriterAt interfaces are unfortunately a necessary design factor, which cannot be worked around.

klauspost commented 1 month ago

Ah, so it is client dependent even. How painful. OK. sftp will have to be slow then.

drakkan commented 1 month ago

Reordering in memory can also be problematic. Think of a client that uploads/downloads a file from end to beginning, for example. This is theoretically possible although I hope no client does this

klauspost commented 1 month ago

Yeah. We are forced to do buffering with uploads. Though it seems we are missing a backstop for the buffering.

drakkan commented 1 month ago

Yeah. We are forced to do buffering with uploads. Though it seems we are missing a backstop for the buffering.

Yes, I quickly looked at MinIO code. In SFTPGo I use pipeat which works in all cases but has obvious drawbacks.

I plan to work on a memory based queue that can fall back to separate fixed size disk files when needed. These unliked disk files will be generated only if absolutely required and deleted as soon as they are no longer needed. This way a memory queue will be used for most common clients and the fallback does not force the entire file to be kept on disk but only small portions

klauspost commented 1 month ago

Thanks for sharing "pipeat" - good tool for the toolbox.

It works "fine" for us for uploads.

sftp is a secondary api for us, so we can deal with some quirky behavior, like a 100MB max offset - and we don't support resuming to start with anyway. Using disks for temporary storage will be too messy, so this will have to do for now.

Without diving deeper, I assumed the ReadAt/WriteAt was just an implementation choice. But if clients can send/receive in arbitrary order I see how it exposes the lowest level possible and we will have to deal with it.