psanford / wormhole-william

End-to-end encrypted file transfer. A magic wormhole CLI and API in Go (golang).
MIT License
1.07k stars 54 forks source link

Add method to get received text directly #75

Closed Jacalz closed 2 years ago

Jacalz commented 2 years ago

This is a proposal to add a new method called GetText() for receives. It allows getting the text directly without having to go through a reader. The benefit here is to avoid the allocations and slowdown caused by having to read through and write to a buffer to get the received string.

psanford commented 2 years ago

I'm pretty cautious about adding new methods to the public API. In this case, I have a number of concerns:

If this method is necessary to fix a performance issue, please provide your profiling methodology and results that demonstrate the issue and that the fix makes a non-negligible improvement.

Jacalz commented 2 years ago

I can understand that you are cautious about it. I did initial benchmarks before opening this PR but I have done even more extensive testing today. The benchmarking code can be found here.

I have put together my benchmarks to be as representative of the wormhole-william code as possible. The takeaway here is that getting the string directly is 69 times faster than the fastest way to read through a string and store it in a bufferr while also allocating nothing in the process. Another interesting thing that I found out is that the grow method actually is slower than not doing it because growing also allocates more capacity than what is strictly needed.

[jacob@fedora test]$ go test -bench .
goos: linux
goarch: amd64
pkg: test
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkBytesBufferNoGrow-8    19102420            52.86 ns/op       48 B/op          1 allocs/op
BenchmarkBytesBuffer-8          10256802           111.7 ns/op       176 B/op          2 allocs/op
BenchmarkBytesBufferString-8    10057257           112.6 ns/op       176 B/op          2 allocs/op
BenchmarkReadAll-8              10598018           103.2 ns/op       512 B/op          1 allocs/op
BenchmarkReadAllFast-8          21035482            49.11 ns/op      128 B/op          1 allocs/op
BenchmarkGetString-8            1000000000           0.7096 ns/op          0 B/op          0 allocs/op
PASS
ok      test    6.687s

I think this new method will be a much more elegant way to get the string contents. Just as you just pass a string when sending text, you can now also get a string when receiving. It is not only faster, but it also is a lot less code and is a bit clearer to read than all the buffers and copies that are otherwise used.

psanford commented 2 years ago

Sorry, I didn't mean a micro benchmark, I meant a profile of an actual application using the library where this is a meaningful performance bottleneck. Its hard for me to image that this would even show up in a profile considering applications are doing encryption, network io, and waiting for human input. Those are all going to be orders of magnitude slower than a 50-100ns improvement shown in your benchmark.

Jacalz commented 2 years ago

The time saved by this does improve a lot more with very large text receives. However, it's probably not going to show up in any profiles until texts are very long. What I'm basically trying to avoid here is duplicating the received text twice in memory. It's not necessarily the speed up that I'm the most interested in.

Jacalz commented 2 years ago

I'd be happy if you have any other suggestions for avoiding saving the text twice. There might be another way to solve what I'm trying to do here.

psanford commented 2 years ago

Sending very large text messages is basically an abuse of the mailbox server, you really shouldn't do that. The text protocol doesn't attempt to make a direct connection or tcp relay'ed connection between the two sides. Instead it just sticks the message itself (encrypted) into the mailbox. That means the mailbox server has to store that message until the other side reads it. If a bunch of clients do this, they could consume all the resources on the mailbox server.

Is there a reason you are sending large messages as text instead of as a file transfer? File transfer has the nice property of being a streaming protocol so you don't end up with multiple copies of the payload in memory at once. It also won't DOS the mailbox server.

Jacalz commented 2 years ago

Is there a reason you are sending large messages as text instead of as a file transfer? File transfer has the nice property of being a streaming protocol so you don't end up with multiple copies of the payload in memory at once. It also won't DOS the mailbox server.

Because I didn't know that files would be better ;) Thanks for the discussion. I'll go ahead and close this then.

piegamesde commented 2 years ago

Maybe we should be a bit more proactive on rate limiting plain Wormhole messages?

psanford commented 2 years ago

Seems like the mailbox server should have a limit on the size of messages it will store in a single mailbox.