[FR] Generate thumbnails to improve loading times with large images

Mason-McGough commented 3 years ago

Proposal Summary

I wish to use FiftyOne to organize datasets of large images (>=50MP), but the app is very slow when loading large images. It takes several minutes to load the thumbnails on a single page and it is even slower when scrolling. Would it be possible to generate smaller thumbnails server-side before serving them to the client?

Motivation

Faster load times and quicker interactions for large images

What areas of FiftyOne does this feature affect?

[x] App: FiftyOne application
[ ] Core: Core fiftyone Python library
[x] Server: FiftyOne server

Details

The resizing could be solved with PIL, OpenCV, scikit-image, ImageMagick, or a similar image processing tool, perhaps whichever is most efficient or introduces the fewest dependencies. If server-side disk space is a concern, the resized images could be stored using a least recently used cache.

Willingness to contribute

The FiftyOne Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

[x] Yes. I can contribute this feature independently.
[x] Yes. I would be willing to contribute this feature with guidance from the FiftyOne community.
[ ] No. I cannot contribute this feature at this time.

brimoor commented 3 years ago

Hi @Mason-McGough 👋 Thanks for the feature request. Thumbnails for the image grid definitely make sense for performance.

Question: I assume that when you click on an image to enter the expanded view (one image at a time), in that case you'd always like to see your full resolution image?

brimoor commented 3 years ago

(Implementation discussion)

Thus far, we have avoided storing any modifications/transformations of the source media for datasets either in the database or on disk in either temporary or permanent locations. So, we have some choices for a feature like this. Here's a few questions to get us started:

Should thumbnails be generated at sample creation time or at App-load time? If they are computed at App load time and thumbnails can be stored permanently, should we expose a compute_thumbnails() method similar to compute_metadata() to allow the user to opt-in to computing thumbnails for an entire dataset offline?

Must thumbnails always be used in App grid? Or will they only be used if they exist?

Can user configure the resolution of thumbnails? Can user opt-out of using thumbnails? (since it is possible to adjust the grid size to show only a small number of quite large images)

How are thumbnails stored?

Permanently in the DB
Permanently on disk in the same locations as the source media?
Temporarily on disk in a cache (and when is this cache cleared?)

Mason-McGough commented 3 years ago

Hi Brian :wave: Thanks for your response! To answer your first question, I personally would prefer to see the full resolution image only when I am viewing in expanded view. At the scale I'm working at (roughly 8000x5000) that is ideal since grid view is slow to the point of unusable. I can't say that's the case for everyone though. For other datasets like COCO the grid view has worked just fine for me.

As for when to generate thumbnails, I would guess at App-load time, maybe with pagination. That way for very large datasets the server isn't suddenly tasked with generating thousands of thumbnails all at once. That compute_thumbnails() method sounds like a good plan to me unless there is any reason not to expose it.

Perhaps thumbnail creation could be an opt-in feature? It makes sense to have that option if your grid size is only showing a handful of images like you said. Maybe some logic in the App could decide to use thumbnails if a) the thumbnail flag is set and b) it can either find a thumbnail in cache or successfully generate a thumbnail. Otherwise it will use the original source image. The plus side is it will always display the original as a failsafe, but the downside is added complexity.

For thumbnail storage, I was initially picturing temporary storage in a cache, on disk and/or in memory, creating new thumbnails on-demand. The cache could have a configurable max size and clears out the least recently used members when a new entry is added. The plus side is it should never (:crossed_fingers:) fill the entire volume this way, at the cost of adding extra parameters. A bigger downside I can foresee is that if you aren't visiting the same images over and over, the cache would roll over pretty consistently, which possibly defeats the purpose. Storing in DB instead would make a lot of sense if you are expecting to see the same images over a long period of time. Would you say the typical FiftyOne user is visiting the same image sets over and over again or are they just passing through once or twice? Curious to hear your thoughts on that.

brimoor commented 3 years ago

Cool, thanks for sharing your thoughts!

To answer your question first: our goal with FiftyOne is to support massive datasets, so we definitely want to design for a user that is performing lots of queries that are pulling different random subsets of a big dataset in the App. So, no assumption on an upper bound on the number of images that a user may be viewing in a session. That said, some cacheing definitely makes sense, as viewing the same images repeatedly in a session could definitely happen.

We had some discussion about this feature internally and basically arrived at a similar place as your recommendation. Specifically:

Compute thumbnails JIT when requested by the App
Store the thumbnails in an internal, ephemeral cache with some configurable size parameters
Provide a config option to enable or disable thumbnailing
If a thumbnail cannot be generated for any reason, fallback to the full resolution image

We had some discussion about possibly allowing the user to configure the thumbnail resolution as well (eg max 256 pixel width thumbnails).

A complicating factor is that with features like patches views, the grid may not contain entire images but rather image patches (currently this feature shows entire images, but in the next release it will start showing only "zoomed" patches). Depending on the size of the patch, extracting a patch from the thumbnail image could be way too small. So, we'd likely want some logic that would choose whether to extract patches from the full resolution or thumbnail image for a particular patch, and, more generally, perhaps compute multiple resolution thumbnails. Interesting design challenge.

Mason-McGough commented 3 years ago

That is an interesting idea. Could you answer a few questions for me?

Are the patches going to be extracted client-side or server-side?
How would the patches be represented and stored server-side?
What are the reasons to prefer creating patches from thumbnails instead of the full-resolution image?
When you click on a patch, does the expanded view show only the enlarged patch or the full-resolution image?

Wondering if the following workflow makes sense:

Extract patches from full-resolution image
For each patch, compute a thumbnail

In this way, each patch is treated like a full-resolution image with respect to their thumbnails.

benjaminpkane commented 3 years ago

Hi @Mason-McGough.

The patches are created client-side, with canvas. We do not create any assets (thumbnails) server side at the moment.

What are the reasons to prefer creating patches from thumbnails instead of the full-resolution image?

At least for an initial implementation, I don't think I would implement server-side thumbnails for patches. Although, it would be useful. Whether or not thumbnails are made on the server for patches, what is shown in App should never be lossful, i.e. the dimensions of the image/patch on screen should not be more than the source thumbnail.

The worst case is a very large image has a very small detection. Server-side thumbnails for patches would be useful in this scenario. As opposed to the server sending the entire source image to only show a small percentage of pixels.

When you click on a patch, does the expanded view show only the enlarged patch or the full-resolution image?

We will always load the full source media (image or video). Zooming, panning, and cropping to content (e.g. the patch) is coming out soon. So the expanded view will always use the full image/video.

Mason-McGough commented 3 years ago

I see. Thanks for answering my questions!

If the patches are generated client-side, then it seems like the patches for grid view should be created from the thumbnail images. Otherwise it would require loading the full-resolution image anyways and defeat the purpose of thumbnails. The expanded view would always fetch the full-resolution image like you said, whether you click on a regular image or a patch.

The big downside is the "large image with small detection" scenario you and Brian mentioned. Generating a small patch from a thumbnail might produce a very blurry image. Do you know how that could be best addressed without server-side thumbnails for now?

brimoor commented 3 years ago

The big downside is the "large image with small detection" scenario you and Brian mentioned. Generating a small patch from a thumbnail might produce a very blurry image. Do you know how that could be best addressed without server-side thumbnails for now?

Yeah it is a tricky problem. Our only idea at the moment is to allow the server-side thumbnailing feature to generate multiple thumbnails of different resolutions for a given image, if necessary, to provide a nice trade-off between data transfer and patch resolution for cases like large-image-with-small-detection.

Mason-McGough commented 3 years ago

I wonder how that trade-off could be used. Would it make sense to have some kind of threshold on, for instance, minimum-patch-to-image-size ratio? If it dips below a set value, send the whole image to the client to generate patches. I'm not sure that I like that since it makes performance unpredictable. I guess we could also put that trade-off in the user's hands with the explicit option: a) send thumbnails at the cost of resolution or b) send full images at the cost of performance. I'm not sure how else to exercise that trade-off in practice.

niclaswue commented 2 years ago

Would love to see this feature implemented, I am having some trouble with the grid view even at 1024x1024 resolution :/

adriantre commented 2 years ago

Great feature request! I would like to add a use case: When using geotiffs the image may consist of more color bands than three (e.g. R,G,B and infrared), or even color bands outside of the visible spectrum. In this case, it would make sense to distinguish the training image from the displayed image in fiftyone. In many of these cases, it is common to have multiple files, one RGB image, and one multiband image, stored on disk.

So in this case it may not be necessary for the display image/thumbnail image to be computed run-time.

benjaminpkane commented 1 month ago

Multiple media fields is the recommend approach

voxel51 / fiftyone