IIIF-compatible service

weserv / images

Source code of wsrv.nl (formerly images.weserv.nl), to be used on your own server(s).

https://wsrv.nl/

BSD 3-Clause "New" or "Revised" License

1.84k stars 187 forks source link

IIIF-compatible service #335

Closed abubelinha closed 3 months ago

abubelinha commented 2 years ago

Not sure if this could be completely off topic here, because nobody else has mentioned IIIF in any previous issues.

I am completely ignorant about how difficult this could be. But having vips behind, this is probably something not impossible to implement.

My starting point for thinking about this was I didn't found any options for requesting and cache specific parts of a remote image (i.e. a region of interest). Of course it is possible to cache that if the remote server provides it by itself, like the iipimage server in #334 example does.

But as weserv/images already offers many remote image manipulation by vips, why not include some ROI-options as well? That would make this project even more unbelievable, by letting any image in the internet to be served using standard protocols like IIP or IIIF.

Example zoomable iip-provided image is here: https://iipimage.sourceforge.io/

Current behaviour:

(1) Part of image manipulation (zoom in ROI) is done by original iip server, and only then (2) is cached by images.weserv.nl (with some optional manipulations, but not all which IIP or IIIF permit): FULL IMAGE: https://images.weserv.nl/?url=merovingio.c2rmf.cnrs.fr/iiif/PIA03883.pyr.tif/full/full/0/default.jpg FULL (resized): https://images.weserv.nl/?url=merovingio.c2rmf.cnrs.fr/iiif/PIA03883.pyr.tif/full/500,/0/default.jpg ROI 1: https://images.weserv.nl/?url=merovingio.c2rmf.cnrs.fr/iiif/PIA03883.pyr.tif/450,800,500,300/full/0/default.jpg ROI 2: https://images.weserv.nl/?url=merovingio.c2rmf.cnrs.fr/iiif/PIA03883.pyr.tif/8300,4150,500,200/full/0/default.jpg

Proposal: provide an alternative IIIF compatible syntax i.e. like this:

https://images.weserv.nl/IIIF-service/whatever_urlencoded_remote_image_url/{region}/{size}/{rotation}/{quality}.{format} Which means weserv would take a whole remote image and then use vips to (1) achive the requested transformations specified by {region}/{size}/{rotation}/{quality}.{format} before (2) caching it.
The same about IIP protocol, although I think it's less used nowadays.

A bit more information about IIIF, although I am pretty sure you already know a lot more than me about this:

As implemented in the above example server: https://iipimage.sourceforge.io/documentation/iiif/
https://github.com/IIIF/awesome-iiif

Thanks a lot for this wonderful project anyway

kleisauke commented 2 years ago

Apart from the technical difficulties (e.g. info.json proxying and handling), I think providing an IIIF-compatible API in addition to the regular API would be beyond the scope of this project.

why not include some ROI-options as well?

Region of interest is just a rectangle crop in our API: https://images.weserv.nl/docs/crop.html#rectangle-crop

However, this will not work on our public service with the original 14400x9600 image you provided, as that would exceed our pixel limit of 71 megapixel. If you're able to host your own solution, you can remove this limit by doing this within the ngnix configuration:

weserv_limit_input_pixels 0;
weserv_limit_output_pixels 0;

But please do this only for trusted input, there are PNG decompression bombs available that could burn all the available memory.

For example, using Docker:

FROM ghcr.io/weserv/images:5.x

RUN sed -i '1iweserv_limit_input_pixels 0;\nweserv_limit_output_pixels 0;' \
    /etc/nginx/imagesweserv.conf

docker build -t weserv/images .
docker run -d -p 8080:80 --shm-size=1gb --name=weserv weserv/images

After that, these URLs produce the same output as your ROI 1/2 URLs:

http://localhost:8080/?url=https://photojournal.jpl.nasa.gov/jpeg/PIA03883.jpg&cx=450&cy=800&cw=500&ch=300
http://localhost:8080/?url=https://photojournal.jpl.nasa.gov/jpeg/PIA03883.jpg&cx=8300&cy=4150&cw=500&ch=200

abubelinha commented 2 years ago

Providing my own service is too technical stuff for my skills, but thanks anyway.

Regrding the size limit info, you mean the first link I put is actually showing a reduced version of the full original image I am requesting? (because I see an image, anyway)

kleisauke commented 2 years ago

Providing my own service is too technical stuff for my skills, but thanks anyway.

Perhaps a shim API that rewrites the calls to our API can be implemented more easily? For example, your first ROI coordinates (/450,800,500,300/) can easily be rewritten as &cx=450&cy=800&cw=500&ch=300.

you mean the first link I put is actually showing a reduced version of the full original image I am requesting?

Indeed, the "full" image on https://merovingio.c2rmf.cnrs.fr/iiif/PIA03883.pyr.tif/full/full/0/default.jpg (4096x2741) is not in fact the original image. Applying your second ROI coordinates (/8300,4150,500,200/) to that image is not possible.

abubelinha commented 2 years ago

Perhaps a shim API that rewrites the calls to our API can be implemented more easily? For example, your first ROI coordinates (/450,800,500,300/) can easily be rewritten as &cx=450&cy=800&cw=500&ch=300.

Indeed. Translating calls looks easy. That's exactly why I suggested to implement that translation server side. So your server understands IIIF calls ... just because IIIF is a standard communication protocol (I guess it's a good thing to make things compatible).

Indeed, the "full" image on https://merovingio.c2rmf.cnrs.fr/iiif/PIA03883.pyr.tif/full/full/0/default.jpg (4096x2741) is not in fact the original image. Applying your second ROI coordinates (/8300,4150,500,200/) to that image is not possible.

You mean it is not possible because weserv would need to download the full image before cropping the requested 500x200 part, don't you? :thinking: Mmm, now I understood what you meant with "size limits" in your 1st answer.

Well, the original IIIF-compliant image provider does also impose download limits (maxWidth, maxHeight) to max size you can download. Like weserv does.

But I agree, this is a different problem: the request cannot be served because of weserv limitations for uploading the original image to weserv. But even if weserv imposes this kind of limits, I wouldn't see an issue here. As long as the json info service announces those limits, the service could still be IIIF compliant (I guess). It could just return an error if somebody tries to serve a huge image (like for example when you request a wrong non-existing image, or whatever): https://iiif.io/api/image/2.1/#error-conditions

If anybody feels those limits are too small, they always can stop using the service and start using their own servers. So, having size limits is not a problem at all, I think. Have I misunderstood you?

A different question is what you said about technical difficulties for constructing that .json information api. I would have expected that to be a minor difficult compared to all the image processing work. But I am completely ignorant about both things.

In summary ... I know you already said IIIF is beyond the scope of this project. But could you elaborate that a bit, just in case somebody else is interested in reading this? What are the main limitations for not trying it? (i.e. what you mean with "beyond the scope")

You don't find it related/useful at all for this project.
You don't have the time for trying it.
You know it would consume too much server resources (so you would prefer not to implement this even if the code is provided by somebody else who is interested in doing it)

Thanks a lot in advance for all your explanations.

kleisauke commented 2 years ago

That's exactly why I suggested to implement that translation server side

Handling it server-side would cause a lot of IIIF specific code. My feeling is that a "shim" (such as the ones listed at https://iiif.io/get-started/image-servers/#make-a-existing-image-server-iiif-compatible) is more appropriate in this case.

You mean it is not possible because weserv would need to download the full image before cropping the requested 500x200 part, don't you?

Yes, exactly. We could do a 500x200 crop on the 4096x2731 image, but the XY coordinates of 8300,4150 cannot be applied to this. Therefore you'll need to fetch the original 14400x9600 image, which would exceed our pixel limit on our public service.

If anybody feels those limits are too small, they always can stop using the service and start using their own servers. So, having size limits is not a problem at all, I think. Have I misunderstood you?

Indeed, size limits is not an issue if you're able to host your own solution.

A different question is what you said about technical difficulties for constructing that .json information api. I would have expected that to be a minor difficult compared to all the image processing work.

Currently, we only fetch images, not JSON files. Constructing the JSON image information by ourselves is not a issue, but how should we handle the optional image data in that case? For example, if the license or rights statement is present on an upstream IIIF server, we should include it too.

What are the main limitations for not trying it? (i.e. what you mean with "beyond the scope")

With "beyond the scope of this project", I mean for this particular Github repository. You're the first to ask this, so I don't know if there is much interest in it either.

I'm a bit worried about this getting into https://xkcd.com/927/ territory, where we simply will end up with 5 IIIF image servers that'll do the same thing. Was there a reason why you couldn't use one of the IIIF-compatible servers listed at https://iiif.io/get-started/image-servers/?

abubelinha commented 2 years ago

Thanks for long explanations and the final joke. Do you really have the feeling that IIIF is not becoming an standard? If so, then indeed I see your point. I had not that feeling but I am a complet newbie so thanks for giving me that opinion.

As for the different servers you link, I am currently using one of them (iipimage), but only for part of my stuff. And I don't really know about the others, but I guess they will be similar in this sense: you store your images and you serve them.

What I found to be different in your project, is that you cache and serve remote images. You are not the original storage point. That's the important part to me. I can use it as a cache for very different kind of image storages, as my institution website, my github page, my google drive, my iipimage server, my friend shared dropbox images ... or whatever other site which is not blacklisted.

I'd love to have a common way to do ROI selection, image manipulation and caching of stuff stored in all those sites ... and here is where IIIF comes. I had the feeling of IIIF being a more popular syntax to use.

Of course programmers can always adapt requests to the particular syntax of each service. But I guess that, if you already have many pieces of software which understand IIIF, it makes more sense trying to adapt weserv to IIIF, rather than adapting all those pieces of software to weserv.

If not ... why are we communicating in English right now? But yes, I agree on what you say about shims, and that makes sense indeed. Kinda Google Translate.

abubelinha commented 2 years ago

Indeed, size limits is not an issue if you're able to host your own solution.

I was not exactly meaning that (which is also true, indeed). I meant that, as far as I know weserv size limits, it is my own problem to be sure I will not go beyond limits.

In the sense that I should not try to request a ROI from a huge image stored in Google Drive. Because if I do that, weserv will return me an error. But if the file is inside the limits, then I can happily use weserv to serve a cached ROI or any other manipulated image.

BTW ... how does weserv know the image is too big? Does it return (or could it return) an error message that is informative about that?

For example, if the license or rights statement is present on an upstream IIIF server, we should include it too

That's indeed a problem. I had not thought about IIIF returning that kind of info (I was just thinking about technical info about images).

But what are you doing about image licenses right now?

I think a license can't be provided in this scenario. Or it should be a unique and top-restrictive license for all images served, stablished by weserv (something like "as we don't know the original license, we serve you the cached image with no license at all: ask the original provider for more info about that"). Could seem inappropiate, but that's what you are doing right now, isn't it?

I mean, do you prevent me to hotlink images from sites which do have an explicit no-hotlink policy? Or do you just let that moral concern to your users' choice?

Of course there are sites which actively avoid hotlinking. But if they only put a written license note in their site, does weserv actually read it before deciding to cache or not to cache? ✋💀 That is the question. I guess it's the weserv user who must be responsible for that.

andrieslouw commented 2 years ago

Hi there, I see some good questions asked about policy and such, let me walk through a bit of history: This project started around 2006, 9 years before IIIF started. The goal was simple in the old days: provide an online service which can make thumbnails fast for users that already have the larger image stored somewhere.

The goal is still the same, and we've kept the same API the past ~16 years. This means links from 2006 still work, as long as the original image is also still online.

We are not in the business of hosting images, we don't have a lot of policy's, except for a few that prevent us from doing harm or getting copyright claims. If the source disappears, the image in our cache will disappear. If the image is reachable over http(s), it will (most often) be reachable through our service. If you abuse our service, we will do our best to stop this abuse to prevent other users from experiencing slow loading images. It is free. It is open source. You're welcome to contribute code, or suggestions or issues that could benefit us, our users, the world.

I'm old skool in this: the internet has been build by us all, we will continue to build the internet together, given our limited spare time and effort. We're as liberal as we can be in what we accept, and conservative in what we do and offer. As all the things we build we also need to support. Reliability and continuity of the service is our main goal, together with speed, ease-of-use, and sharing what we can and learned while doing.

The IIIF looks to have enough support, but is not (yet) a leading standard, maybe it will be, we'll keep an eye on it. You could develop a shim, please do so, and we'll make sure to link to it. Maybe, if we have enough understanding and time, we can make the effort to integrate it into the codebase and support it. But the original API is our first priority, as it is in use by many users on many sites, for many (16) years, and over 6 million resizes per hour.

There is no commercial model, there is no extensive team of developers, we don't have a business plan. Not monetizing this service is probably the reason we still exist. It's just shear fun and commitment to the open community we all love and treasure.

And indeed: the user is responsible. We tell every lawyer or DMCA-firm the same thing. Remove the original image, and it will be removed by us. We have no intention of hosting, as the world is already complex enough without hosting. And frankly, the BSD-license is the only license I care about in this project. But I will always do my utter best to protect creators of content, as long as they're real persons, and am actively supporting those who do have issues with the service we provide.

abubelinha commented 2 years ago

👍 All very reasonable. Thanks for your deep explanation @andrieslouw

kleisauke commented 3 months ago

I hope this information helped. Please feel free to re-open if questions remain.