samvera / node-iiif

This module provides a full-featured IIIF Image API 2.1 and 3.0 image processor. It covers only the image processing pipeline itself, leaving input and output to the caller.
Apache License 2.0
27 stars 5 forks source link

Refactor dimension handling and the transformation pipeline #28

Closed mbklein closed 1 year ago

mbklein commented 1 year ago

This is a pretty big refactoring of how node-iiif retrieves and makes use of image dimensions, as well as the pipeline setup. It was done to facilitate two things:

In the process, I also learned how to use sharp's internal metadata() method when the source is a stream, something I had experimented with previously without success. Not only is it possible, but benchmarks showed it to be about 40-60% faster than the probe-image-size package the default dimension function had been using. More efficiency, fewer dependencies.

Despite the extensive changes, the module is still interface-compatible with v3.x. I think a minor version bump would be warranted on release.

mbklein commented 1 year ago

Some benchmarks:

node-iiif v3.1.2:

$ time curl -svo /dev/null http://localhost:3000/iiif/2/98-pyramid/full/3000,/0/default.jpg
*   Trying 127.0.0.1:3000...
* Connected to localhost (127.0.0.1) port 3000 (#0)
> GET /iiif/2/98-pyramid/full/3000,/0/default.jpg HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.82.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< X-Powered-By: tinyhttp
< Access-Control-Allow-Headers: *
< Access-Control-Allow-Methods: OPTIONS, HEAD, GET, POST, PUT, DELETE
< Access-Control-Allow-Origin: *
< Content-Type: image/jpeg
< etag: W/"107d67-dXEkzjCmTRpc/pInIctorvC6MG4"
< Date: Tue, 21 Mar 2023 15:03:28 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
< Content-Length: 1080679
< 
* Closing connection 0

curl 0.00s user 0.01s system 0% cpu 40.512 total

node-iiif from this PR branch:

✗ time curl -svo /dev/null http://localhost:3000/iiif/2/98-pyramid/full/3000,/0/default.jpg
*   Trying 127.0.0.1:3000...
* Connected to localhost (127.0.0.1) port 3000 (#0)
> GET /iiif/2/98-pyramid/full/3000,/0/default.jpg HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.82.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< X-Powered-By: tinyhttp
< Access-Control-Allow-Headers: *
< Access-Control-Allow-Methods: OPTIONS, HEAD, GET, POST, PUT, DELETE
< Access-Control-Allow-Origin: *
< Content-Type: image/jpeg
< etag: W/"10554c-yt9DtAPYwz7Z57N3wYf8EEKQJyQ"
< Date: Tue, 21 Mar 2023 15:06:19 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
< Content-Length: 1070412
< 
* Connection #0 to host localhost left intact

curl 0.00s user 0.01s system 1% cpu 0.651 total

40.5s down to 651ms seems like a decent return on investment. 😄

orangewolf commented 1 year ago

I think the answer to both my questions is likely "yes this works the right way". if that's the case then Im happily in approval

mbklein commented 1 year ago

Benchmarked using Benchmark.js, via this test script. Test images were fed to the benchmark script via a text file called s3Urls.txt, containing one s3://bucket/key URI per line.

$ ./tiff_metadata_benchmarks.js
sharp.metadata() x 4,141 ops/sec ±35.51% (24 runs sampled)
probe() x 2,372 ops/sec ±25.77% (27 runs sampled)
Fastest is [ 'sharp.metadata()' ]