How many bytes are needed to make a determination?

summera commented 4 years ago

Based on my testing, it seems the number of bytes needed to retrieve the metadata about the image varies. Is there some some maximum or is there no cap on the number of bytes needed before we are able to make a determination on image size?

I'm in a situation where I'd like to retrieve the metadata using probe-image-size from a stream and then continue (without creating a brand new stream) streaming the data into some transformations. In order to do this, I think the best way would be to use a PassThrough to buffer the data needed by probe-image-size before sending off to the transformations. So it looks something like this:

const readPassThrough = new PassThrough();
const readProxy = pipeline(readStream, readPassThrough, ()=> {});
const fileMetadata = await probe(readStream);
readProxy.pipe(transformer).pipe(writeStream);

The problem I ran into is that probe-image-size fetched about 50815 bytes for a JPEG I tested with before being able to output the image metadata. This is bigger than the default highWaterMark for the PassThrough, which is 16384 bytes. So the PassThrough couldn't accept the 50815 bytes, causing backpressure, which stopped the entire pipeline. So, in order to make this example work, I'd have to use const readPassThrough = new PassThrough({ highWaterMark: 51200 });. I'm wondering if there is some max value I can set highWaterMark to in order to guarantee I wouldn't run into backpressure issues. If not, I may have to just create a new read stream after probing the metadata and accept the performance hit.

Thanks in advance for the help!

puzrin commented 4 years ago

There is no guaranteed cap for some formats like jpeg. I'd suggest you to inspect sync versions, those are easy to understand.

Internal parsers eat chunks of any size until succeed or fail. Those consume immediately anything available on input.

summera commented 4 years ago

@puzrin thanks for the quick response!

I'd suggest you to inspect sync versions, those are easy to understand.

Are you referring to the parsers in the parse_sync directory?

How about GIFs? Based on the gif sync parser it looks like all the metadata is always in the first 10 bytes. Is this correct?

puzrin commented 4 years ago

How about GIFs? Based on the gif sync parser

Yes, if you parse gifs only, then 10 bytes will be enough. But such task is very rare in real world.

summera commented 4 years ago

Ok great. I only really need the width/height before transforming in the case of a GIF, so I can work with that. Thanks for all the help 👍

nodeca / probe-image-size

How many bytes are needed to make a determination? #38