image-rs / image

Encoding and decoding images in Rust
Apache License 2.0
5k stars 618 forks source link

Better parallelism controls #2357

Open Shnatsel opened 1 month ago

Shnatsel commented 1 month ago

image only provides the rayon compile-time feature for controlling parallelism. There are no runtime controls exposed, which means there isn't an obvious way to control things like:

  1. Whether format encoders/decoders will use multi-threading (OpenEXR, AVIF). Multi-threading in batch workloads that saturate all cores anyway would only cause slowdowns.
  2. Whether certain image operations will use parallelism internally. For example, resize #2223 and blur #986 could benefit, but unconditionally enabling parallelism would degrade performance for small images so we aren't using parallelism right now.
fintelia commented 1 month ago
  1. This is particularly tricky because methods like image::open or DynamicImage::save don't provide knobs to precisely control behavior, but I'm not thrilled about just defaulting to single-threaded implementations unless users go out of their way to request multithreading. For codecs, maybe the right thing to do is to add overrides on ImageEncoder / ImageDecoder to force single-threading for folks who need it, and otherwise base on whether the rayon feature is enabled?
  2. I think the decision of whether an image is too small to benefit from parallelism should be within the library. Afterall, we're probably the best suited to measure what the fixed overheads are.
awxkee commented 1 month ago

I'd also prefer to delegate decision to library how many threads to use, and if its even worth to the library, with simple flag if its enabled or no, with multi-threading enabled by default.

kornelski commented 3 weeks ago

I'd like to be able to limit max number of cores. I've got servers with 128 cores, and each library creating a threadpool for the entire core count makes everything an overkill.

Shnatsel commented 3 weeks ago

For deployment-time rather than coding-time thread count customization, I believe the RAYON_NUM_THREADS environment variable documented here would be a good option.

fintelia commented 3 weeks ago

Also, I'd expect we'd use the rayon global thread pool rather than creating our own. That seems like it would mitigate the issue by sharing the same threads with any other library that used rayon

kornelski commented 3 weeks ago

But I'd like to use max 4 cores for each image being decoded (there are diminishing results from splitting decoding further, and I don't want one job to monopolize the thread pool), but not have the whole 128-core server limited to 4 cores.

fintelia commented 3 weeks ago

Wait, doesn't rayon default to using the current thread pool? So if you call image::open from within a 4 thread pool, any parallel operations would be limited to those thread unless we went out of our way to use a different pool?

You'd have to keep track of a bunch of different thread pools, but that seems unavoidable if you're trying to subdivide 128 cores