Avoiding `scratch_buffer` (a field of `struct Reader<R>`)

There is a TODO in next_interlaced_row asking to "change the interface of next_interlaced_row to take an output buffer instead of making us return a reference to a buffer that we own". I very much agree with this TODO - it seems that it would be best to output directly to the final buffer (as the next_frame API does) rather than forcing the caller to copy the bytes. I assume that outputting directly to the final buffer would be good for:

Reducing the number of memcpy-like calls
Reducing the number of L1 cache misses
Reducing the memory pressure overall

FWIW, the performance considerations above mostly do not affect the next_frame API (which calls into lower-level functions like next_interlaced_row_impl for non-interlaced images) and therefore mostly do not affect png crate's benchmarks. OTOH, users of the png crate who wish to post-process the output (e.g. to transform RGB into RGBA, or alpha-multiply) may wish to do such post-processing row-by-row (while the freshly decoded row is still hot in the L1 cache). More specifically, the performance considerations to apply to:

image::codecs::png::PngReader::read (which calls next_row)
Current prototype integrating the png crate into Chromium (currently built on top of the image crate, but working directly with the png crate also wouldn't help because of the current shape of the next_row API)

So (given the presence of TODO + performance benefits), should I just go ahead and make a breaking change to the png::Reader::next_row and png::Reader::next_interlaced_row APIs?

Increase the crate version number in Cargo.toml
Remove struct Row<'data>
Remove struct InterlacedRow<'data>
Modify fn next_row:
- Old: pub fn next_row(&mut self) -> Result<Option<Row>, DecodingError>
- New: pub fn next_row(&mut self, buf: &mut [u8]) -> Result<Some<usize>, DecodingError>, documenting that:
  - This function may panic if buf is too small for the next row
  - This function returns the length of the row, or None is there is no next row
Modify fn next_interlaced_row:
- Old: pub fn next_interlaced_row(&mut self) -> Result<Option<InterlacedRow>, DecodingError>
- New: pub fn next_interlaced_row(&mut self, buf: &mut [u8]) -> Result<Option<InterlaceInfo>, DecodingError>, documenting that:
  - This function may panic if buf is too small for the next interlaced row
  - This function returns row info, or None if there is no next row

WDYT? Are there some alternative API designs that we should consider first?

image-rs / image-png

Avoiding `scratch_buffer` (a field of `struct Reader<R>`) #417