linebender / norad

Rust crate for working with Unified Font Object files
Apache License 2.0
43 stars 12 forks source link

Improving disk I/O performance on Windows #334

Closed RickyDaMa closed 10 months ago

RickyDaMa commented 11 months ago

Full disclosure, this is basically an advert for a little crate I wrote over the weekend: close_already

In short, by using my crate (under 1000 SLoC including the two transitive dependencies*) I was able to speed up norad (without rayon) writing a big UFO to disk by 67%, and a further ~10% when the rayon feature was enabled (despite a suboptimal implementation with conflicting threadpools). You can read more about how this works in the crate's README

I still need to benchmark read performance for exact numbers, but there should also be performance improvements there as well

I've made the library work as a no-op on other compile targets and tried to keep it simple & uninvasive to integrate, so I was wondering if norad would be interested in a PR to show what this looks like?

* EDIT: when using the default backend-threadpool

RickyDaMa commented 10 months ago

Okay, the crate is now a lot more polished and ready for use. However, I've come across an issue with integrating with norad specifically, mainly due to limitations in how you can specify features in Cargo.tomls

Background: close_already now supports multiple different backends to avoid conflicting threadpools where you're already using something like rayon, or an async runtime. These backends are chosen from with mutually exclusive backend-blah features

I presume the rayon dependency/feature isn't enabled in this crate because rayon is relatively heavy as dependencies go, which is fair enough. Per close_already's backend-selection, when we are using rayon, I want to activate the backend-rayon feature, so it doesn't have a separate threadpool 'fighting' with rayon's

That's all well and good, until we want to make close_already default when rayon isn't enabled (given that's where the biggest gains are), but using a lighter backend (the SLoC quote from before was using the now-named backend-threadpool)

An attempt at this setup looks something like this:

[dependencies]
# -- snip --
rayon = { version = "1.3.0", optional = true }
# Note: no backend enabled here
close_already = { version = "0.3", default-features = false }

[features]
# By default, enable the backend-threadpool feature of close_already
default = ["close_already/backend-threadpool"]
# The "rayon" feature enables the dependency rayon and activates the backend-rayon feature
rayon = ["dep:rayon", "close_already/backend-rayon"]

The problem now is that building with the rayon feature has two backends enabled, resulting in close_already failing to compile. There is no way in the Cargo.toml to specify "de-activate backend-threadpool when the rayon feature is enabled". (Tangent: mutually exclusive features don't even exist at a Cargo.toml level, you have to detect it at compile time in source code and compile_error!, which is ugly and awkward)

Potential fixes:

  1. Require users to specify no-default-features = true when they want to use norad with the rayon feature
  2. Use close_already in either no-rayon or rayon mode, but not both. This however then requires #[cfg] flags throughout the code as close_already isn't always present
  3. Always use backend-rayon (at that point you could just remove the rayon feature and make it default, since rayon will be compiled in anyway)
  4. Change close_already somehow to resolve which backend you actually want if you enable multiple (how??)

Interested in thoughts, feelings, and emotions from people. If we want more reproducible benchmarks to make a decision based on close_already's value proposition, I'm happy to get something put together for that and share it here

RickyDaMa commented 10 months ago

Ran the numbers:

default (baseline):
read & parse Roboto-Regular.ufo
                        time:   [223.82 ms 224.33 ms 224.83 ms]
write Roboto-Regular.ufo
                        time:   [2.0756 s 2.0973 s 2.1211 s]

close_already (rayon):
read & parse Roboto-Regular.ufo
                        time:   [231.55 ms 232.02 ms 232.50 ms]
write Roboto-Regular.ufo
                        time:   [987.44 ms 1.0257 s 1.0691 s]

close_already (threadpool):
read & parse Roboto-Regular.ufo
                        time:   [223.36 ms 224.00 ms 224.68 ms]
write Roboto-Regular.ufo
                        time:   [975.15 ms 1.0152 s 1.0596 s]

---

rayon (baseline):
read & parse Roboto-Regular.ufo
                        time:   [28.253 ms 28.368 ms 28.492 ms]
write Roboto-Regular.ufo
                        time:   [867.16 ms 922.49 ms 985.35 ms]

close_already (rayon) + rayon:
read & parse Roboto-Regular.ufo
                        time:   [28.643 ms 28.820 ms 29.010 ms]
write Roboto-Regular.ufo
                        time:   [831.17 ms 871.48 ms 915.87 ms]

close_already (threadpool) + rayon:
read & parse Roboto-Regular.ufo
                        time:   [28.395 ms 28.495 ms 28.603 ms]
write Roboto-Regular.ufo
                        time:   [830.45 ms 870.99 ms 914.95 ms]

Takeaways:

Conclusion: I'll PR always-on close_already using backend-threadpool (the lightweight default) for writes only

cmyr commented 10 months ago

So this is tricky, and it's a problem I've run into before.

Basically: cargo features are intended to be additive. That is to say, a feature is not intended to be used to switch between impls, and enabling a feature should never 'turn off' any existing code, only 'turn on' something.

One possible solution would be to allow your crate to compile two backends, but have one of them just be unused, if rayon is available? We'll still compile it, but the generated code should at least be pruned during dead code analysis. (I believe that this is the conclusion you've reached anyway?)

There's also definitely an argument for just always enabling rayon in norad. I have historically tried to be judicious in my use of dependencies, and that is the rationale for having rayon behind a feature; there are patterns of using norad where you aren't relying on us to load everything, in which case rayon would be unnecessary, but maybe I could also just relax and leave it on, if that would be a good simplification.

RickyDaMa commented 10 months ago

One possible solution would be to allow your crate to compile two backends, but have one of them just be unused, if rayon is available?

This would require a fairly intensive refactor of close_already, and I think would open up a new question: if multiple backends are selected, how do you know which one is the one to use?

I believe that this is the conclusion you've reached anyway?

The conclusion I reached was that using the backend-threadpool didn't see a meaningful performance difference to backend-rayon, so I'm just using backend-threadpool everywhere since it is lightweight (see f2478e3f3d7654f8fe20bd4f2100bf8411b37b41)

I have historically tried to be judicious in my use of dependencies, and that is the rationale for having rayon behind a feature; there are patterns of using norad where you aren't relying on us to load everything

Yeah this is 100% valid and I approach my own Rust projects the same way: aggressively minimise dependencies and their features wherever possible. I think whether rayon should be default is up for debate (but outside the scope of this issue), but having is be optional is a good thing, for more streamlined use cases