saucecontrol / PhotoSauce

MagicScaler high-performance, high-quality image processing pipeline for .NET
http://photosauce.net/
MIT License
589 stars 49 forks source link

Optimal approach to chain multiple transformations and persist output stream at each stage. #55

Closed ryanmar closed 3 years ago

ryanmar commented 4 years ago

I'm looking at a better approach to persisting multiple image dimensions. We average about 100K ~12MP images daily, and keep a cached dimensions of each at 1200x, 600x and 240x. We are using MagicImageProcessor.Process to trasform these streams in memory. Is there a better approach that would use the PipelineProcess constructs? There aren't any examples, and the documentation leads me to believe that it composes multiple transforms but only one "output" stream at the end. Is that correct? Or is there a way to both capture the stream at each stage as well as let it become the source stream for an additional stage? And would any of it actually result in a more efficient (memory, compute) than our current approach of orchestrating the streams and multiple calls to Process? Many thanks. This library runs circles around the nightmare that was our old GDI approach.

saucecontrol commented 4 years ago

the documentation leads me to believe that it composes multiple transforms but only one "output" stream at the end. Is that correct?

That's correct. The ProcessingPipeline model was built around a few specific low-level interop scenarios, and it would need more work to completely support what you're looking for. I have a couple of sample gists I've written up for other issues you might find helpful for seeing how it's used:

It would be possible, using the same basic pattern in those gists, to write the output of your first resize into a memory buffer and then use that buffer as a source to do a null transform and save for your first image, then re-use that same buffer as a source for your second and third resizes.

would any of it actually result in a more efficient (memory, compute) than our current approach of orchestrating the streams and multiple calls to Process?

Probably not, actually. At least not enough to justify the added complexity. Using that technique would disable the planar (YCbCr) processing ProcessImage does with a JPEG source, making more work for both the pipeline and the output codec. And it would use more memory since the entire 1200px output would be in memory, which the pipeline normally avoids.

I have a longer-term vision for extending the pipeline model to support forking/merging for more advanced processing. I'll keep this use case in mind when I get around to designing and building that out.

This library runs circles around the nightmare that was our old GDI approach.

I can imagine, with that many images of that size 👀 . It's always interesting to hear what kind of volume people are throwing at this thing. Glad to know it's working out.