Closed HuakunShen closed 1 month ago
Cool, those might make some numbers suitable to add to "the ecosystem doc" (https://github.com/magic-wormhole/magic-wormhole/blob/master/docs/ecosystem.rst) or somewhere in the general protocols repo
Wonder what could be the bottleneck.
There could be many reasons why the mac implementation is slower in your specific case. It's impossible to tell, unless you profile it. Could be interesting to know. Just to make sure: You compile against aarch64, not against x86_64?
I found the potential issue. I reviewed the code of the python and golang version with a debugger and found that the rust implementation has a different chunk size for each send. Python and Golang implementations both send 16KB at a time while the rust version sends 4KB at a time.
After setting chunk size to 16KB I get full speed.
File sending starts from here https://github.com/magic-wormhole/magic-wormhole/blob/02407c4aa4cc3f8d8cd01d549fdc72a5f5d77010/src/wormhole/cli/cmd_send.py#L442
fs = basic.FileSender()
with self._timing.add("tx file"):
with progress:
if filesize:
# don't send zero-length files
yield fs.beginFileTransfer(
self._fd_to_send,
record_pipe,
transform=_count_and_hash)
The chunk size is defined in twisted package twisted.protocols.basic.FileSender
File chunks are read here https://github.com/twisted/twisted/blob/02a2b658cd1ade5d7f41f97d898913686313e615/src/twisted/protocols/basic.py#L892
CHUNK_SIZE
is a constant defined as CHUNK_SIZE = 2**14
(which is 16384bytes, 16kB)
at https://github.com/twisted/twisted/blob/02a2b658cd1ade5d7f41f97d898913686313e615/src/twisted/protocols/basic.py#L857
The golang implementation also defines chunk size to be 16KB (recordSize := (1 << 14)
)
Rust is using 4KB.
I changed 4096 to 16384 and build a release build that gives me 117MB/s when sender is M1 pro Mac.
I think the rust implementation can also use 16KB.
I tested with the rust and python client, as well as golang implementation at https://github.com/psanford/wormhole-william/
To get a consistent speed measurement, python client is always used for receiving and speed measurement.
Under my 1000mbps network.
So only sending on M1 mac with rust wormhole has performance issue.
Wonder what could be the bottleneck. Given the Python and golang version can eat full bandwidth of my network, there must be something wrong with the rust implementation.
I am guessing Ubuntu doesn't have problem, due to its much higher single core frequency.