pkg / sftp

SFTP support for the go.crypto/ssh package
BSD 2-Clause "Simplified" License
1.52k stars 380 forks source link

More optimization #397

Closed puellanivis closed 3 years ago

puellanivis commented 3 years ago
puellanivis commented 3 years ago

Hm… benchmarks are wildly variant.

As noted, the delayed benchmarks show vastly bigger improvement on speed, and throughput.

Allocation memory used by Writes and ReadFrom are down ~99% and ~91% respectively. For Writes+Delay, and ReadFrom+Delay, ~75% and ~70% respectively.

Allocs per op are down -14% for ops <= 32k, but up about 20% for bigger ops.

These are nice:

Write1k-12                   90.6MB/s ± 1%  103.5MB/s ± 1%   +14.21%  (p=0.000 n=10+10)
Write16k-12                   597MB/s ± 2%    807MB/s ± 3%   +35.09%  (p=0.000 n=10+10)
Write32k-12                   742MB/s ± 2%   1067MB/s ± 3%   +43.77%  (p=0.000 n=10+10)
Write128k-12                  932MB/s ± 3%   1335MB/s ± 3%   +43.28%  (p=0.000 n=10+9)
Write512k-12                 1.29GB/s ± 3%   1.52GB/s ± 1%   +17.63%  (p=0.000 n=9+9)
Write1MiB-12                 1.35GB/s ± 3%   1.53GB/s ± 4%   +12.59%  (p=0.000 n=10+10)
Write4MiB-12                 1.45GB/s ± 2%   1.55GB/s ± 4%    +7.25%  (p=0.000 n=10+10)
Write4MiBDelay10Msec-12      38.3MB/s ± 0%  113.2MB/s ± 2%  +195.55%  (p=0.000 n=10+10)
Write4MiBDelay50Msec-12      7.98MB/s ± 0%  25.35MB/s ± 0%  +217.87%  (p=0.000 n=9+10)
Write4MiBDelay150Msec-12     2.68MB/s ± 0%   8.64MB/s ± 0%  +222.28%  (p=0.000 n=10+10)
ReadFrom1k-12                1.64GB/s ± 3%   1.74GB/s ± 4%    +6.27%  (p=0.000 n=10+10)
ReadFrom16k-12               1.52GB/s ± 5%   1.62GB/s ± 5%    +6.46%  (p=0.000 n=10+10)
ReadFrom32k-12               1.48GB/s ± 3%   1.56GB/s ± 7%    +5.35%  (p=0.003 n=10+10)
ReadFrom128k-12              1.46GB/s ± 2%   1.55GB/s ± 3%    +6.36%  (p=0.000 n=10+10)
ReadFrom512k-12              1.43GB/s ± 4%   1.53GB/s ± 4%    +7.29%  (p=0.000 n=10+10)
ReadFrom1MiB-12              1.43GB/s ± 1%   1.54GB/s ± 3%    +7.94%  (p=0.000 n=9+10)
ReadFrom4MiB-12              1.42GB/s ± 4%   1.53GB/s ± 5%    +7.92%  (p=0.000 n=10+10)
ReadFrom4MiBDelay10Msec-12   71.3MB/s ± 0%  111.7MB/s ± 0%   +56.58%  (p=0.000 n=10+9)
ReadFrom4MiBDelay50Msec-12   14.8MB/s ± 0%   23.1MB/s ± 0%   +55.72%  (p=0.000 n=10+6)
ReadFrom4MiBDelay150Msec-12  4.97MB/s ± 0%   7.74MB/s ± 0%   +55.73%  (p=0.000 n=10+10)

Vs. this PR without concurrency updates:

Except delayed operations, throughput all stays pretty similar.

Delayed Read/Write/ReadFrom all show more speed, much more throughput, and about 20–25% more allocs/op.

ReadFrom without delays uses about 23 times more memory usage on each buffer size. (Could be bad sync.Pool behavior?)

Writes <= 32k show 90% less memory usage, but for bigger writes -50% (less) to +27% (more) memory. (probably due to greater concurrency being enabled as buffer size grows) Throughputs are all relatively unchanged.


I think with the throughputs being mostly a wash, memory usage being down, despite a larger number of allocs/op, the tradeoff for better performance on delayed lines is probably a good trade. Well, once I figure out the 23x more memory being used by ReadFrom :laughing:

drakkan commented 3 years ago

Hi,

I did a quick test after applying this patch

0001-replace-sync.Pool-with-our-allocator.zip

the results are quite similar:

name                         old time/op    new time/op    delta
ReadFrom1k-12                  8.12ms ± 6%    8.36ms ± 6%    ~     (p=0.075 n=10+10)
ReadFrom16k-12                 8.45ms ± 3%    8.82ms ± 3%  +4.32%  (p=0.000 n=10+9)
ReadFrom32k-12                 9.13ms ± 6%    8.77ms ± 2%  -4.00%  (p=0.001 n=10+10)
ReadFrom128k-12                9.46ms ± 5%    8.87ms ± 4%  -6.22%  (p=0.000 n=10+10)
ReadFrom512k-12                9.59ms ± 3%    8.90ms ± 2%  -7.15%  (p=0.000 n=9+9)
ReadFrom4MiB-12                9.36ms ± 4%    8.94ms ± 1%  -4.49%  (p=0.000 n=10+9)
ReadFrom4MiBDelay10Msec-12     93.9ms ± 0%    94.0ms ± 0%    ~     (p=0.105 n=10+10)
ReadFrom4MiBDelay50Msec-12      456ms ± 0%     456ms ± 0%    ~     (p=0.497 n=9+10)
ReadFrom4MiBDelay150Msec-12     1.36s ± 0%     1.36s ± 0%    ~     (p=0.529 n=10+10)

name                         old speed      new speed      delta
ReadFrom1k-12                1.29GB/s ± 6%  1.26GB/s ± 7%    ~     (p=0.075 n=10+10)
ReadFrom16k-12               1.24GB/s ± 3%  1.19GB/s ± 3%  -4.15%  (p=0.000 n=10+9)
ReadFrom32k-12               1.15GB/s ± 6%  1.20GB/s ± 2%  +4.09%  (p=0.001 n=10+10)
ReadFrom128k-12              1.11GB/s ± 5%  1.18GB/s ± 5%  +6.60%  (p=0.000 n=10+10)
ReadFrom512k-12              1.09GB/s ± 3%  1.18GB/s ± 2%  +7.68%  (p=0.000 n=9+9)
ReadFrom4MiB-12              1.12GB/s ± 4%  1.17GB/s ± 1%  +4.66%  (p=0.000 n=10+9)
ReadFrom4MiBDelay10Msec-12    112MB/s ± 0%   112MB/s ± 0%    ~     (p=0.127 n=10+10)
ReadFrom4MiBDelay50Msec-12   23.0MB/s ± 0%  23.0MB/s ± 0%    ~     (p=0.283 n=9+10)
ReadFrom4MiBDelay150Msec-12  7.73MB/s ± 0%  7.73MB/s ± 0%    ~     (all equal)

name                         old alloc/op   new alloc/op   delta
ReadFrom1k-12                  2.46MB ± 0%    2.23MB ± 0%  -9.39%  (p=0.000 n=10+10)
ReadFrom16k-12                 2.46MB ± 0%    2.23MB ± 0%  -9.41%  (p=0.000 n=10+10)
ReadFrom32k-12                 2.45MB ± 0%    2.23MB ± 0%  -9.33%  (p=0.000 n=10+10)
ReadFrom128k-12                2.45MB ± 0%    2.23MB ± 0%  -9.34%  (p=0.000 n=9+10)
ReadFrom512k-12                2.45MB ± 0%    2.23MB ± 0%  -9.32%  (p=0.000 n=10+10)
ReadFrom4MiB-12                2.45MB ± 0%    2.23MB ± 0%  -9.24%  (p=0.000 n=10+10)
ReadFrom4MiBDelay10Msec-12     12.9MB ± 0%    12.7MB ± 0%  -1.23%  (p=0.000 n=10+10)
ReadFrom4MiBDelay50Msec-12     12.9MB ± 0%    12.7MB ± 0%  -1.46%  (p=0.000 n=10+10)
ReadFrom4MiBDelay150Msec-12    13.0MB ± 0%    12.7MB ± 0%  -2.10%  (p=0.000 n=9+10)

name                         old allocs/op  new allocs/op  delta
ReadFrom1k-12                   2.59k ± 0%     2.51k ± 0%  -3.33%  (p=0.000 n=10+10)
ReadFrom16k-12                  2.59k ± 0%     2.51k ± 0%  -3.32%  (p=0.000 n=7+6)
ReadFrom32k-12                  2.59k ± 0%     2.51k ± 0%  -3.33%  (p=0.000 n=10+10)
ReadFrom128k-12                 2.59k ± 0%     2.51k ± 0%  -3.34%  (p=0.000 n=10+10)
ReadFrom512k-12                 2.59k ± 0%     2.51k ± 0%  -3.33%  (p=0.000 n=10+10)
ReadFrom4MiB-12                 2.59k ± 0%     2.51k ± 0%  -3.32%  (p=0.000 n=10+10)
ReadFrom4MiBDelay10Msec-12      3.24k ± 0%     3.15k ± 0%  -2.67%  (p=0.000 n=10+10)
ReadFrom4MiBDelay50Msec-12      3.25k ± 0%     3.16k ± 1%  -2.71%  (p=0.000 n=10+10)
ReadFrom4MiBDelay150Msec-12     3.28k ± 1%     3.18k ± 1%  -3.10%  (p=0.000 n=10+10)
drakkan commented 3 years ago

Hi,

with this patch applied I have a failure in SFTPGo test cases here. Basically the server calls channel.Close() while a client is connected and uploading a file. It seems that the client does not disconnect anymore in this case. I'll try to write a proper standalone reproducer and/or a test case the next weekend, thank you

puellanivis commented 3 years ago

I have committed at least somewhat of an improvement on the WriteTo here, but it still doesn’t really do the reads in a particularly good parallelized way. But the reads and writes are at least done in parallel. Getting the reads to work in parallel is going to require a separate goroutine that acts like a work-feed, pushing work to do to the reader, that then does the work.

I’m not sure if I want to do that improvement now, or in a follow up. I think the general improvements here have been waiting long enough, that we can commit this, and cut a release, then follow up with a more parallelized WriteTo in another PR.

P.S.: I mean, after we’ve resolve the apparent regression issue pointed out from @drakkan

drakkan commented 3 years ago

I have committed at least somewhat of an improvement on the WriteTo here, but it still doesn’t really do the reads in a particularly good parallelized way. But the reads and writes are at least done in parallel. Getting the reads to work in parallel is going to require a separate goroutine that acts like a work-feed, pushing work to do to the reader, that then does the work.

I’m not sure if I want to do that improvement now, or in a follow up. I think the general improvements here have been waiting long enough, that we can commit this, and cut a release, then follow up with a more parallelized WriteTo in another PR.

P.S.: I mean, after we’ve resolve the apparent regression issue pointed out from @drakkan

oops sorry, I'm using my branch and I completely forgot about this issue.

Are you unable to reproduce the issue and do you want a reproducer or a test case? If so I'll try to find some time, thanks

drakkan commented 3 years ago

Hi,

I finally did some tests.

As soon as I close the channel from SFTPGo I see this log client side:

./sftpclienttest 
start copy
copy finished, bytes 3014656 err: Connection Lost

but since I'm limiting bandwidth server side, SFTPGo hasn't received 3014656 bytes yet, so the connection stays open (server side) until SFTPGo receives all the pending data.

So in my test case I just need to increase the wait timeout, now it takes about 5 seconds while before the pending data was sent in less than 1 second.

There are now more parallel writes and therefore more pending data.

In short this is not a bug. Sorry for the noise and being so lazy

puellanivis commented 3 years ago

Hm, 🤔 I think we would prefer that the ReadFrom does not ever return until all the pending data has been transferred. Though, I guess this test is done by sending something like /dev/zero over a restricted pipe, then suddenly cutting the connection, and then seeing how long it takes both client and server to finish up? So, the client ends quite abruptly, because it is the source, and notices immediately a disconnect, while on the server, there’s a bunch of pending data trickling in?

If so, there is a certain amount of “maxConcurrency” that we should see. It might be good to tune it and find an appropriate timeout value based on that amount of maxConcurrency * maxPacketSize. Then that should hold regardless. Yeah, it’s kind of interesting that even though we’re doing things in parallel, the data on the pipe is still eventually pushed through sequentially, so increasing concurrency == longer stream. 😆

drakkan commented 3 years ago

Hi,

to understand the issue I used this test program

package main

import (
    "crypto/rand"
    "fmt"
    "io"
    "io/ioutil"
    "net"
    "os"
    "path/filepath"

    "github.com/pkg/sftp"
    "golang.org/x/crypto/ssh"
)

func main() {
    username := "b"
    password := "password"
    addr := "127.0.0.1:2022"
    // create a big enough test file
    content := make([]byte, 10485760)
    _, err := rand.Read(content)
    if err != nil {
        panic(err)
    }
    testFilePath := filepath.Join(os.TempDir(), "testfile")
    err = ioutil.WriteFile(testFilePath, content, os.ModePerm)
    if err != nil {
        panic(err)
    }

    config := &ssh.ClientConfig{
        User: username,
        HostKeyCallback: func(hostname string, remote net.Addr, key ssh.PublicKey) error {
            return nil
        },
        Auth: []ssh.AuthMethod{ssh.Password(password)},
    }
    conn, err := ssh.Dial("tcp", addr, config)
    if err != nil {
        panic(err)
    }
    sftpClient, err := sftp.NewClient(conn)
    if err != nil {
        panic(err)
    }
    defer sftpClient.Close()

    src, err := os.Open(testFilePath)
    if err != nil {
        panic(err)
    }
    defer src.Close()

    dst, err := sftpClient.Create("testupload")
    if err != nil {
        panic(err)
    }
    defer dst.Close()
    fmt.Printf("start copy\n")
    n, err := io.Copy(dst, src)
    fmt.Printf("copy finished, bytes %v err %v\n", n, err)
}

after the upload starts, from the SFTPGo web interface I select the connection and click "Disconnect", this will call channel.Close() server side.

The test program exits printing something like this:

./sftpclienttest 
start copy
copy finished, bytes 2752512 err Connection Lost

from the SFTPGo web interface I see something like this:

Schermata del 2021-01-23 13-09-12

so the connection is still alive even if the sftpclient program is now dead.

The connection ends when the size is the same one printed from the client. I think these data are accumulated inside a crypto/ssh internal buffer.

I could be wrong but I don't see any issue here:

1) the client disconnects as soon as the channel is closed 2) the server reads all the accumulated data and then closes the connection

drakkan commented 3 years ago

If I repeat the test using pkg/sftp v1.12 I have this output client side:

./sftpclienttest 
start copy
copy finished, bytes 2457600 err <nil>

so no error is detected, you fixed this, great!

Server side I have:

"Unexpected error for transfer, path: \"/tmp/test_home/testupload\", error: \"unexpected EOF\" bytes sent: 0, bytes received: 2457600 transfer running since 18750 ms"

the received bytes (2457600 in this case) always match the ones printed from the client.

If I use this branch the received bytes are always a bit smaller, for example:

./sftpclienttest 
start copy
copy finished, bytes 3080192 err Connection Lost

and the server logs:

"Unexpected error for transfer, path: \"/tmp/test_home/testupload\", error: \"unexpected EOF\" bytes sent: 0, bytes received: 3014656 transfer running since 23000 ms"

do you mean this in the comment above?

puellanivis commented 3 years ago

I was just trying to understand the scope of what you were reporting. The fewer bytes on the server side could be from an error happening ahead of the end of the full transfer, which would result in that error prioritizing ahead of the end of the full transfer.

As long as you’re good with things, I’m probably good with things. Although, I’ve kind of wanted to build some tests of the parallel execution to make sure that it is actually commiting things in order, but I think it would require reworking the ReadFrom to accommodate an arbitrary max data size, so that it splits up a file very finely, and make sure the transfer all happens in order still. That way we wouldn’t need a super large file to get the parallelism to activate. I’ll take a look, but it might be able to be done as a patch level change?

drakkan commented 3 years ago

Hi,

I'm only reporting the difference between v1.12/git master and this branch just in case you want to indagate.

I think losing some bytes in error cases is not a problem

puellanivis commented 3 years ago

Agree, dropping bytes is not automatically bad in an error state. At least, as long as any written value reflects actual sequential bytes written.

Hm. :thinking: But also now that I think about it, it’s probably technically possible that we could write something with this code like AAABB0CCC and written == 5. This might cause issues in some cases, were it is assumed that written == 5 means no bytes were ever written after 5, rather than just 5 bytes only were sequentially written.

Although, without reverting to a sequential writing method, I don’t think there is any way to guarantee that we don’t possibly perform an overwrite like this… But then, callers of io.Reader.Read() need to regularly trunc their read-buffers to the length actually read. So maybe as long as we give a good documentation of this behavior, (which should still only ever happen in an error situation), we might be able to avoid confusion.

I’ll work on drafting some language. Should be something like, “note in the case of an error, written will reflect the full length of sequential bytes written without error, however, due to parallel work, additional bytes may possibly be written after that length. You may need to Truncate() the file to this length after an error to prevent bugs.”

drakkan commented 3 years ago

Hi,

please consider the upload/resume use case:

If the written bytes are not sequential the resume will produce a corrupted file. Am I missing something? (I haven't read the code)

puellanivis commented 3 years ago

Yes, that’s exactly the case that I’m referring to. The problem is that we cannot just automatically Truncate the file ourselves, because of this case:

However, if we’re transferring a file, and there is no pre-existing data then we have:

drakkan commented 3 years ago

Yes, that’s exactly the case that I’m referring to. The problem is that we cannot just automatically Truncate the file ourselves, because of this case:

  • File contains: AAABBBCCCDDD.
  • we try to write XXXYYYZZZ to File, so that we want to end up with XXXYYYZZZDDD.
  • we actually write only XXX at offset 0, YY at offset 3, and ZZZ at offset 6, so we end up with XXXYYBZZZDDD returning a written == 5.
  • If we use File.Truncate(5) here, we’ll destroy the file, and in particular truncate out the DDD data that might not be recoverable.
  • The code should recover by knowing 5 bytes were written correctly, so it only needs to write YZZZ at offset 5.
  • we then write YZZ at offset 5, and Z at offset 8, giving the correct XXXYYYZZZDDD.

However, if we’re transferring a file, and there is no pre-existing data then we have:

  • File contains: `` (nothing).
  • we try to write XXXYYYZZZ to File, so that we end up with XXXYYYZZZ.
  • we actually write only XXX at offset 0, YY at offset 3, and ZZZ at offset 6, so we end up with XXXYY0ZZZ returning written == 5.
  • we got an error, and we know there is no valid data after the written == 5, so we can just File.Truncate(5) and we get: XXXYY.

this will probably not work if the client is killed/forcibly terminated or similar. It is a separate/additional step.

Please read this part of the specs:

 SSH_FXF_APPEND_DATA
      Data is always written at the end of the file.  The offset field
      of SSH_FXP_WRITE requests is ignored.

What do you think about making non-sequential writes configurable?

  • We can now either quit the program, and/or restart the write at from offset 5 ourselves.
puellanivis commented 3 years ago

I can definitely change the code around to do sequential confirmed writes, but that is also going to kill a lot of the performance gains for high latency connections.

On the other hand, it makes the most common case code extremely simplified.

drakkan commented 3 years ago

I can definitely change the code around to do sequential confirmed writes, but that is also going to kill a lot of the performance gains for high latency connections.

On the other hand, it makes the most common case code extremely simplified.

If we don't want to make sequential writes configurable I think this is the best thing to do.

For what I understand sequential writes can leads to unexpected bugs if the client process is forcibly terminated.

We still need to allow sequential writes for SFTP servers that ignore the offset in SSH_FXF_APPEND_DATA (our current server implementation never ignores offsets and OpenSSH seems to do the same).

Thank you for all these efforts!

puellanivis commented 3 years ago

If we don't want to make sequential writes configurable I think this is the best thing to do.

For what I understand sequential writes can leads to unexpected bugs if the client process is forcibly terminated.

Do you mean non-sequential writes here? Because sequential writes are kind of the most reliable way to avoid bugs, even if the client process terminates. At least in that case there isn’t a possibility of an out-of-order write to a file.

I’m thinking I make two parallel types of functions, one sequential (like readFromSequential now), and the other… I don’t know what best to call it. :laughing: Then we can consider if we want to have a ReadFrom that can switch between the two, so we can use an option flag to switch between them and default to sequential, or if we just straight make ReadFrom sequential, and to get the out-of-order parallel one you have to call a separate function type.

My only remaining concern then is the WriteAt (and thus `Write as well) function itself, it’s also doing a lot of potentially out-of-order writing when given large buffers.

drakkan commented 3 years ago

If we don't want to make sequential writes configurable I think this is the best thing to do. For what I understand sequential writes can leads to unexpected bugs if the client process is forcibly terminated.

Do you mean non-sequential writes here? Because sequential writes are kind of the most reliable way to avoid bugs, even if the client process terminates. At least in that case there isn’t a possibility of an out-of-order write to a file.

yes, sorry for the typo

I’m thinking I make two parallel types of functions, one sequential (like readFromSequential now), and the other… I don’t know what best to call it. Then we can consider if we want to have a ReadFrom that can switch between the two, so we can use an option flag to switch between them and default to sequential, or if we just straight make ReadFrom sequential, and to get the out-of-order parallel one you have to call a separate function type.

as long as it is configurable it is good for me

My only remaining concern then is the WriteAt (and thus `Write as well) function itself, it’s also doing a lot of potentially out-of-order writing when given large buffers.

the user have to explictly set a max packet > 32768 to trigger this usage. If so I think a documentation note is enough

puellanivis commented 3 years ago

the user have to explictly set a max packet > 32768 to trigger this usage. If so I think a documentation note is enough

No, this happens any time the write buffer is > max packet, regardless of whatever the user has set explicitly. (My idea on the out-of-order read/write testings was even to set a maxPacket = 3 or something to force massively parallel reading/writing.)

So by default, if I try and write say, a 64k buffer to an SFTP file, it will get split up into two packets and sent down the line in parallel.

drakkan commented 3 years ago

the user have to explictly set a max packet > 32768 to trigger this usage. If so I think a documentation note is enough

No, this happens any time the write buffer is > max packet, regardless of whatever the user has set explicitly. (My idea on the out-of-order read/write testings was even to set a maxPacket = 3 or something to force massively parallel reading/writing.)

So by default, if I try and write say, a 64k buffer to an SFTP file, it will get split up into two packets and sent down the line in parallel.

Oh yes

if len(b) <= f.c.maxPacket

but since io.Copy use a 32768 buffer I still think a doc note could be enough

puellanivis commented 3 years ago

Right, but io.Copy is going to call into ReadFrom or WriteTo if those are implemented, rather than calling into Write itself.

puellanivis commented 3 years ago

OK, default behavior is now to avoid concurrent/out-of-order writes. You must explicitly enable it with UseConcurrentWrites().

I have also written in concurrent reads into WriteTo(), which is still safe, since it is a sequential write process.

I found a bug in my ReadAt that started from f.offset instead of from the given off argument. :grimacing:

I also realized that there was a race condition accessing and setting f.offset in File.Seek(). Now, every subfunction that accesses f.offset should be properly locked.

I’ve polished all the other code as well. I’ve tried to document all the concurrent code well enough.

If we can get some good solid testing on this, and/or eyeballs on the code. :cold_sweat:

drakkan commented 3 years ago

@puellanivis thank you for working on this.

I did some tests and the issue reported above is fixed. Also SFTPGo test cases have no issues now. I also did some manual testing looking at used file descriptors etc.. and all seems fine.

I don't readed the code in depth, at first look it seems ok.

I wonder why for packets methods some methods have a pointer receiver and others don't. What do you think about something like this?

diff --git a/client.go b/client.go
index 8ea9261..5ce2726 100644
--- a/client.go
+++ b/client.go
@@ -916,7 +916,7 @@ func (f *File) Read(b []byte) (int, error) {
 func (f *File) readChunkAt(ch chan result, b []byte, off int64) (n int, err error) {
        for err == nil && n < len(b) {
                id := f.c.nextID()
-               typ, data, err := f.c.sendPacket(ch, sshFxpReadPacket{
+               typ, data, err := f.c.sendPacket(ch, &sshFxpReadPacket{
                        ID:     id,
                        Handle: f.handle,
                        Offset: uint64(off) + uint64(n),
@@ -1282,7 +1282,7 @@ func (f *File) Write(b []byte) (int, error) {
 }

 func (f *File) writeChunkAt(ch chan result, b []byte, off int64) (int, error) {
-       typ, data, err := f.c.sendPacket(ch, sshFxpWritePacket{
+       typ, data, err := f.c.sendPacket(ch, &sshFxpWritePacket{
                ID:     f.c.nextID(),
                Handle: f.handle,
                Offset: uint64(off),
diff --git a/packet.go b/packet.go
index 55a6021..18d0b20 100644
--- a/packet.go
+++ b/packet.go
@@ -603,9 +603,9 @@ type sshFxpReadPacket struct {
        Handle string
 }

-func (p sshFxpReadPacket) id() uint32 { return p.ID }
+func (p *sshFxpReadPacket) id() uint32 { return p.ID }

-func (p sshFxpReadPacket) MarshalBinary() ([]byte, error) {
+func (p *sshFxpReadPacket) MarshalBinary() ([]byte, error) {
        l := 4 + 1 + 4 + // uint32(length) + byte(type) + uint32(id)
                4 + len(p.Handle) +
                8 + 4 // uint64 + uint32
@@ -716,9 +716,9 @@ type sshFxpWritePacket struct {
        Data   []byte
 }

-func (p sshFxpWritePacket) id() uint32 { return p.ID }
+func (p *sshFxpWritePacket) id() uint32 { return p.ID }

-func (p sshFxpWritePacket) marshalPacket() ([]byte, []byte, error) {
+func (p *sshFxpWritePacket) marshalPacket() ([]byte, []byte, error) {
        l := 4 + 1 + 4 + // uint32(length) + byte(type) + uint32(id)
                4 + len(p.Handle) +
                8 + // uint64
@@ -734,7 +734,7 @@ func (p sshFxpWritePacket) marshalPacket() ([]byte, []byte, error) {
        return b, p.Data, nil
 }

-func (p sshFxpWritePacket) MarshalBinary() ([]byte, error) {
+func (p *sshFxpWritePacket) MarshalBinary() ([]byte, error) {
        header, payload, err := p.marshalPacket()
        return append(header, payload...), err
 }
diff --git a/packet_test.go b/packet_test.go
index 8b16be6..0ba78c7 100644
--- a/packet_test.go
+++ b/packet_test.go
@@ -155,7 +155,7 @@ var sendPacketTests = []struct {
                Pflags: flags(os.O_RDONLY),
        }, []byte{0x0, 0x0, 0x0, 0x15, 0x3, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x4, 0x2f, 0x66, 0x6f, 0x6f, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0}},

-       {sshFxpWritePacket{
+       {&sshFxpWritePacket{
                ID:     124,
                Handle: "foo",
                Offset: 13,
@@ -321,7 +321,7 @@ func BenchmarkMarshalOpen(b *testing.B) {
 func BenchmarkMarshalWriteWorstCase(b *testing.B) {
        data := make([]byte, 32*1024)
        for i := 0; i < b.N; i++ {
-               sp(sshFxpWritePacket{
+               sp(&sshFxpWritePacket{
                        ID:     1,
                        Handle: "someopaquehandle",
                        Offset: 0,
@@ -334,7 +334,7 @@ func BenchmarkMarshalWriteWorstCase(b *testing.B) {
 func BenchmarkMarshalWrite1k(b *testing.B) {
        data := make([]byte, 1024)
        for i := 0; i < b.N; i++ {
-               sp(sshFxpWritePacket{
+               sp(&sshFxpWritePacket{
                        ID:     1,
                        Handle: "someopaquehandle",
                        Offset: 0,

If I understand the code correctly the issue reported here is not fixed, am I wrong?

puellanivis commented 3 years ago

After some (annoying to setup but) quick benchmarks, yeah, it does look like using pointers here would work slightly better.

drakkan commented 3 years ago

After some (annoying to setup but) quick benchmarks, yeah, it does look like using pointers here would work slightly better.

Thank you for confirming, do you want to integrate this change yourself or do you prefer a separate PR?

drakkan commented 3 years ago

Thank you! I think this PR should be merged now so that others can test it too. Do you have other pending changes?

puellanivis commented 3 years ago

I don’t think there’s anything on my todo list, so it should be good to merge. Then we can get some people to maybe bleeding-edge update to master, and give it a good work out ahead of cutting a release.

drakkan commented 3 years ago

This branch is used in sftpgo main now, thank you

drakkan commented 3 years ago

@puellanivis thanks for this feature, I never thought I'd use concurrent writes, instead they are very useful on high latency networks, I had a performance improvement from 5MB/s to 15MB/s by enabling them, great!