Closed Thomasdezeeuw closed 2 years ago
Thanks for the report @Thomasdezeeuw!
Can you tell me a little bit more about your use-case? Especially interested in why you need to process 160MB quickly on a debug build.
Running a quick benchmark, the baseline CRC32 algorithm in debug mode on my M1 manages just about 2MB/s, which seems like it would line up with what you are seeing, give or take some overhead related to gRPC and DEFLATE.
I'm not sure if it's possible to go any faster with the standard Rust debug profile tbh, the code is very straight-forward and it's generally known that Rust debug builds can be quite slow.
Happy to help you look into this further in any case, just saying that as a disclaimer upfront!
Thanks for the report @Thomasdezeeuw!
Can you tell me a little bit more about your use-case? Especially interested in why you need to process 160MB quickly on a debug build.
We're streaming SQL query results over gRPC.
Running a quick benchmark, the baseline CRC32 algorithm in debug mode on my M1 manages just about 2MB/s, which seems like it would line up with what you are seeing, give or take some overhead related to gRPC and DEFLATE.
The readme claims 1500MB/s for the baseline version, isn't 2MB/s significantly off that target? Are the release optimisations really responsible for a 750x increase in throughput? I've seen 20/30x before, but 750 really another of magnitude. That's rather surprising to me (not that I'm disputing it).
I'm not sure if it's possible to go any faster with the standard Rust debug profile tbh, the code is very straight-forward and it's generally known that Rust debug builds can be quite slow.
Would you think it's possible to get e.g. 100MB/s, that would really solve this issue.
Happy to help you look into this further in any case, just saying that as a disclaimer upfront!
Would very much appreciate the help. Perhaps the debug builds would be helped by #[inline]
attributes for some functions?
Would you be able to try out this commit via your setup?
On my machine, this bumps up debug-mode throughput from 2MB/s to ~205MB/s.
Would you be able to try out this commit via your setup?
On my machine, this bumps up debug-mode throughput from 2MB/s to ~205MB/s.
That commit saves us 4 seconds on a 25MB file, so that would be ~26 seconds on the original 160MB! Furthermore it essentially removes crc32 from the collected process sample, from 1505 down to 12. Fantastic work!
Great! Released this as 1.3.1
, so picking up that version should be a permanent fix for you.
Closing this ticket for now, feel free to re-open if you continue to see issues!
Thanks @srijs for the quick response, fix and release!
Apologies in advance because this is going to be a rather poor bug report. However I wanted to report it anyway.
I've found that this crate is unreasonably slow in debug mode (i.e. not with
--release
, then it works fine) on macOS M1. Receiving 160 MB of compressed data over gRPC takes almost two minutes. The datas is coming from localhost so latency is not the issue.I've attached a mac process sample. It's not easily readble, but basically it calls
crc32fast::Hasher::update
fromflate2::crc::Crc::update
roughly 1500 times, which takes up ~75% of those two minutes.If I change to release mode (i.e. with
--release
) this time is reduced to about 4 seconds.