image-rs / image-png

PNG decoding and encoding library in pure Rust
https://docs.rs/png
Apache License 2.0
357 stars 140 forks source link

Need public API to seek to another APNG frame #510

Open anforowicz opened 6 days ago

anforowicz commented 6 days ago

Disclaimer

This issue is based on my current, evolving understanding of the requirements that Chromium and Skia architectures impose upon an underlying codec implementation. I am opening this issue early to facilitate a transparent discussion, but before accepting the new APIs we should definitely require that I first finish prototyping the APNG-focused part of the Chromium/Blink/Skia integration (as a proof-of-concept that the new APIs actually work and solve my use case).

Problem 1a: No way to get to next image data

When Reader.next_interlaced_row reaches the end of a frame, then it gets "stuck" - subsequent calls will continue returning None (which correctly indicates that no more rows in the current frame) and there is no way to proceed to the next frame.

Problem 1b: No way to read the next fcTL chunk

To successfully decode a frame, one needs to have its FrameControl data, but Info.frame_control won't be updated until the fcTL chunk is encountered and parsed. There is no way to ask Reader to get the next FrameControl metadata.

Problem 2: No way to go back and decode an earlier frame

Skia and Blink abstractions allow asking a codec to (again) decode an earlier frame:

Early discussion

Initially I tried solving problem 1 by publicly exposing Reader.read_until_image_data.

One solution to problem 2 would be to (brainstorming quality, please bear with me):

I think that if we have seek_to_frame API, then we would still need to make read_until_image_data public (to support moving to the next fcTL when the input stream doesn't implement Seek).

I am not sure if there are other, better ways to solve problem 2.

anforowicz commented 5 days ago

Let me also be transparent that I am not yet 100% sure about the motivation here.

The immediate motivation is that the AnimatedPNGTests.ParseAndDecodeByteByByte test requires seeking to an earlier frame (at least as currently written), because it calls DecodeFrameBufferAtIndex in two separate loops: here and here. But fixing this test doesn't necessarily require seeking to an arbitrary earlier frame:

I am working with the Skia team to better understand the motivation for supporting seeking to an earlier frame. One hypothesis is that this is needed when decoding the current frame depends on first restoring one of previous frames (e.g. because one or more earlier frames use DisposeOp.Previous). Discarding previous frames seems to be handled by SkiaImageDecoderBase::CanReusePreviousFrameBuffer, but maybe previous frames can still be discarded in some scenarios (just guessing: maybe when working under memory pressure?).

fintelia commented 5 days ago

Seeking back to the first frame makes sense to me, but I'd be curious what use seeking to a different frame would be? I'd expect that you wouldn't be able to reconstruct it without potentially needing to know all previous frames.

Another question I have looking at the current API is that we don't seem to distinguish between the static image and the first frame of the animation, which may or may not be the same. Might be worth separating them like image-webp does.

As far as API changes, it might be worth investigating whether to unconditionally require Seek in the next major release. That would also enable the decoder to save the locations of metadata chunks and decode them lazily when requested, rather than always saving them without knowing whether they'll be needed

HeroicKatora commented 4 days ago

The comparison to image-webp and tiff is interesting, there's lots of overlap in the required seek functionality. In particular, I can well see the use-case in building a list of all fcTL chunks which includes the necessary information to compute any presentation dependency (i.e. walk backwards until a full dispose, or opaque blending with the full area) . With that graph an efficient lookup and rebuild for individual frames is also possible, similar to keyframes. I think all of this can algorithmically and in types live separate from the encoder itself. That introduces a slight imprecision where the depencency graph of an image might be used with a mismatched decoder, but I think that's more than acceptable instead of introducing a lot of complexity within the decoder.


As an aside: with regards to Seek as bound, these exact issues are good reasons to investigate Seek as a runtime property instead a compile time. You're now aware of the problem space, please consider this solution prototype. Significant unsafe and an optional unstable attribute, so escalate these internally to wherever appropriate.

anforowicz commented 1 day ago

For now I've implemented seeking to an earlier frames by 1) rewinding the input stream to the beginning and recreating png::Reader and 2) reading/moving frames one-by-one until the desired frame. This lets me make further progress on passing additional Chromium/Blink tests, and set the efficient-seeking problem as something that we may reconsider later - for now I've opened https://crbug.com/371060427 to track this on my side.


we don't seem to distinguish between the static image and the first frame of the animation, which may or may not be the same.

Yes. FWIW this is not blocking - the client can recognize the situation (animation_control present, but frame_control missing in the image Info) and handle it as needed (skipping the first frame somehow - either via next_frame, or by a new API that wraps read_until_image_data)


Seeking back to the first frame makes sense to me, but I'd be curious what use seeking to a different frame would be? I'd expect that you wouldn't be able to reconstruct it without potentially needing to know all previous frames.

I can well see the use-case in building a list of all fcTL chunks which includes the necessary information to compute any presentation dependency (i.e. walk backwards until a full dispose, or opaque blending with the full area) . With that graph an efficient lookup and rebuild for individual frames is also possible, similar to keyframes. I think all of this can algorithmically and in types live separate from the encoder itself.

Right - see how:

(Note that the code above is codec-independent.)

But, I am still not sure in what scenario a previous/required animation frame would be unavailable. If required frames may be discarded (out-of-viewport? low-on-memory?) then maybe it should be okay to restart the animation from the first frame? I dunno...