magic-wormhole / magic-wormhole.rs

Rust implementation of Magic Wormhole, with new features and enhancements
European Union Public License 1.2
645 stars 72 forks source link

Performance issue on Mac with M1 pro #224

Closed HuakunShen closed 1 month ago

HuakunShen commented 2 months ago

I tested with the rust and python client, as well as golang implementation at https://github.com/psanford/wormhole-william/

To get a consistent speed measurement, python client is always used for receiving and speed measurement.

Under my 1000mbps network.

Sender Computer Sender Client Receiver Computer Receiver Client Speed
M1 pro Mac python Ubuntu i7 13700K python 112MB/s
M1 pro Mac rust Ubuntu i7 13700K python 73MB/s
M1 pro Mac golang Ubuntu i7 13700K python 117MB/s
Ubuntu i7 13700K python M1 pro Mac python 115MB/s
Ubuntu i7 13700K rust M1 pro Mac python 116MB/s
Ubuntu i7 13700K golang M1 pro Mac python 117MB/s
Ubuntu i7 13700K python Kali VM (on Mac) python 119MB/s
Kali VM (on Mac) python Ubuntu i7 13700K python 30MB/s
Ubuntu i7 11800H rust Ubuntu i7 13700K python 116MB/s
Ubuntu i7 13700K rust Ubuntu i7 11800H python 116MB/s

So only sending on M1 mac with rust wormhole has performance issue.

Wonder what could be the bottleneck. Given the Python and golang version can eat full bandwidth of my network, there must be something wrong with the rust implementation.

I am guessing Ubuntu doesn't have problem, due to its much higher single core frequency.

meejah commented 2 months ago

Cool, those might make some numbers suitable to add to "the ecosystem doc" (https://github.com/magic-wormhole/magic-wormhole/blob/master/docs/ecosystem.rst) or somewhere in the general protocols repo

felinira commented 2 months ago

Wonder what could be the bottleneck.

There could be many reasons why the mac implementation is slower in your specific case. It's impossible to tell, unless you profile it. Could be interesting to know. Just to make sure: You compile against aarch64, not against x86_64?

HuakunShen commented 2 months ago

I found the potential issue. I reviewed the code of the python and golang version with a debugger and found that the rust implementation has a different chunk size for each send. Python and Golang implementations both send 16KB at a time while the rust version sends 4KB at a time.

After setting chunk size to 16KB I get full speed.

Python

File sending starts from here https://github.com/magic-wormhole/magic-wormhole/blob/02407c4aa4cc3f8d8cd01d549fdc72a5f5d77010/src/wormhole/cli/cmd_send.py#L442

fs = basic.FileSender()

with self._timing.add("tx file"):
    with progress:
        if filesize:
            # don't send zero-length files
            yield fs.beginFileTransfer(
                self._fd_to_send,
                record_pipe,
                transform=_count_and_hash)

The chunk size is defined in twisted package twisted.protocols.basic.FileSender

File chunks are read here https://github.com/twisted/twisted/blob/02a2b658cd1ade5d7f41f97d898913686313e615/src/twisted/protocols/basic.py#L892

CHUNK_SIZE is a constant defined as CHUNK_SIZE = 2**14 (which is 16384bytes, 16kB) at https://github.com/twisted/twisted/blob/02a2b658cd1ade5d7f41f97d898913686313e615/src/twisted/protocols/basic.py#L857

Golang

The golang implementation also defines chunk size to be 16KB (recordSize := (1 << 14))

See https://github.com/psanford/wormhole-william/blob/68dc3447a8585b060fb1e6836a23847700ab9207/wormhole/send.go#L363

Rust

Rust is using 4KB.

https://github.com/magic-wormhole/magic-wormhole.rs/blob/6082d8b11d33b075285d31adc8a34ea03906f2cd/src/transfer/v1.rs#L585

I changed 4096 to 16384 and build a release build that gives me 117MB/s when sender is M1 pro Mac.

I think the rust implementation can also use 16KB.