whisperfish / rust-phonenumber

Library for parsing, formatting and validating international phone numbers.
Apache License 2.0
162 stars 55 forks source link

Loading default `DATABASE` is extremely slow in WASM #26

Open EndilWayfare opened 4 years ago

EndilWayfare commented 4 years ago

For some reason, the initial deref of the lazy_static takes several seconds, locking up the page while the first phone number is either parsed or formatted. After that, it's really snappy. Using the Chrome Dev Tools profiler, it looks like the bincode deserializing only takes ~40ms and most of the time is spent in Database::from. The callgraph is dominated by closure calls. I can only guess that the WASM backend can't optimize away all the closures as aggressively as the x86 one can? Screenshot_101420_063819_PM

Initializing the database in a web worker won't work, because it doesn't share memory with the main thread. The database is not Deserialize + Serialize, so you can't send it over manually either. I considered raw JS interop with one of the libphonenumber-based NPM packages out there, but there would be no way to convert the result to a phonenumber::PhoneNumber, since the struct is encapsulated such that the only way to make one is by parsing from a string. That leaves the most promising option as an unnecessary client type, something like

pub enum ClientPhoneNumber {
    PhoneNumber(phonenumber::PhoneNumber),
    DefinitelyAValidPhoneNumberIPromise(String),
}

and leaving the final say to the backend. This is certainly possible, but negates a lot of the benefit of having a unified client+server in full-stack Rust.

EndilWayfare commented 4 years ago

Update: Ok, to be clear, that's the unoptimized dev build. Timing the release build is much less egregious, though still a noticeable hitch. I didn't think about this at first, since the x86 dev build is rust-iliciously fast. Screenshot_101420_064533_PM It's a little harder to tell what's going on, since the function names are gone. 0.5s is definitely better than 4.0s, but it still feels like there's room for improvement.

When I get a chance, I may fork and hack on Database::from to see if de-closuring has any effect. If successful, would an associated pull request be welcome? Is the current testing situation sufficient?

yannleretaille commented 4 years ago

I had a look into this. The issue is that when the metadata gets initialized from the database, a syntax check on all regexps is performed (and theres a lot of them!). This is to prevent ugly runtime errors later on. The deserialization from the bincoded database itself is fast as you pointed out. flamegraph-before

The fix is quite straightforward actually, as we do not need to perform the syntax check at runtime if we already validate the regexps during the build phase when we parse Googles XML using loader. As an added benefit, this also means that there will never be a error due to invalid regexps in Googles data as the crate will not compile in that case. I already implemented it with good results on my machine and would expect the time to go down to less then 15ms in your wasm use case on your machine. image

I will post the PRs to rust-phonenumber and regex-cache (which required a minor modification) later.

EndilWayfare commented 4 years ago

Wow, @yannleretaille, you rock!

Gotta love compile-time guarantees. :)