gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
http://www.gradio.app
Apache License 2.0
32.13k stars 2.4k forks source link

`Image` component requests #466

Closed pngwn closed 9 months ago

pngwn commented 2 years ago

Is your feature request related to a problem? Please describe.

We are getting a lot of feature requests for the different interactive Image variants, historically we had numerous different tools to handle the different kinds of Image editing functionality, this has improved somewhat but the Image is less feature rich than it used to be. Sadly over time the new (kinda) Image component has become difficult to maintain and extend and need substantial refactoring to realise its full potential.

We are also finding the current signature of pre and post-process limiting, the challenge here is that the sensible thing would be to issue a breaking change to make a clean break with the past but we don't want to introduce too much churn for users.

This issue will collate all feedback we have received so far (assuming I can find it all) and act as a single place to discuss features and design of a new unified image component.

overview

Broadly speaking the Image component (as an interactive input) has two key parts: the source of the image and the editing capabilities. The rewrite that will stem from this issue will preserve the different inputs (and maybe there are more people would like to see) but unify the editing tools into a single, simple (hopefully) GUI. The high level thinking is that the Gradio developer will be able to toggle and constrain the various features one by one if necessary (with defaults and templates making this even simpler for users who do not need such granular control).

References to "Gradio API" refer to controlling the feature via the Gradio Python API when creating the app. References to "GUI" refer to end users controlling the feature in the browser when interacting with the tool.

inputs/source

One thing that has crossed my mind is allowing multiple inputs, this would allow Gradio app authors to be very flexible with what kinds of Image sources are set which will work well for some more general models. Could be controllable via the GUI (defaulting to blank background but with buttons/ toggles to enable different source modes).

Note: source=canvas would be deprecated as everything can be a canvas in the new world.

Are there other possible inputs we should consider here?

editor tools

Currently the Gradio Image component is a simple raster/bitmap graphics editor but there is no reason we cannot support certain vector features. I would be wary of attempting any kind of comprehensive vector tools (specifically things like modifying paths + curves, creating new shapes from the intersection/union of multiple shapes, etc.) but we could support some simple shape tools with transforms (translate/rotate/resize). It would probably make sense to start with rasters only because combining vectors and raster images introduces some complexities that we have no choice but to push onto the user (such as needing to rasterise vectors and flatten layers in order for image filters to work as expected).

We have had lots of requests so here goes:

general

Some more general things can need to be handled better. The main thing I can think of here is the size of the canvas. It is a little better today than it was yesterday but still not ideal.

fullscreen mode

We have never really had this, the previous 'full screen' wasn't really full screen but we should add this.

canvas size

I'm not 100% sure what is the best way to approach the canvas size. I definitely thing we need to respect any options passed into the Gradio API, so app authors can set the most appropriate canvas size (and ratio) for their model but I'm not sure about other cases.

Currently we size the cavas based on either the 'source' or if there isn't one, the screensize. So if a users uploads a 500x500 image, then that will be the size of the canvas (scaled in the browser to account for device pixel ratio) but this might not be ideal as very large images could slow down the predictions. We could accept a max width/ height and never go above that to ensure we aren't sending huge images back to the server to be processed unwittingly.

Would love to get people's thoughts on this one.

performance

Performance iif the current component is ok but there are some performance issues which are a result of a number of things. They can be addressed in a rewrite as we will almost certainly need to switch to webgl to do implement some of these features in a performant manner (while maintain good UX). Calling them out here for posterity, not a great deal to discuss.

pre and post-process

These signatures need to change, they aren't work right now and things aren't going to get any better. This is a pretty significant breaking change because the Image component is our most used component. We will have to discuss how we manage this.

I think the image component should switch to always returns a dictionary with a series of keys. We have had numerous requests about returning certain layers separately and others together, so we can discuss the specifics in this thread but something like:

{
  "image": "background_image.whatever",
  "mask": "mask_image.whatever",
  "sketch": "...",
  ...
}

There are questions around the exact shape of this, what if we have multiple masks? Should that be a list/array on the mask key or should every layer have its own key? Should the return be a list of dicts instead, containing meta information about that layer? How does an app author figure out what each layer is for (take the example of three masks again)? Should we also return a composite image in addition to the separate layers?

Would be good to get people's thoughts on this one as well.


Issues

Features

Feature requests but should be custom components?

Bugs

omerXfaruq commented 2 years ago

this issue looks very old, what's its status?

pngwn commented 2 years ago

Still important just not as important as other things.

charlesfrye commented 2 years ago

Thanks for your hard work on an awesome tool! I just wanted to chime in on why this is important to me, as a user.

The image editor was one of my favorite Gradio 2.x features. It allowed me to "play" with my computer vision models in the same way that NLP folks have been able to play with theirs. I used it very fruitfully to probe and understand the failure modes of an OCR model. This makes it a killer feature to combine with flagging as part of an "exploratory model analysis" workflow, where Gradio can shine as a central component.

Without the full editor, I have much less reason to prefer Gradio for this over other libraries for rapid model-centric app development, like Streamlit. I'd also like to register that it was very confusing to see the documentation for the editor choice in the inputs.Image class's tool kwarg totally unchanged, still referring to a "full-screen editor", even though the feature was intentionally removed.

Cheers, and thanks for making a really useful library!

abidlabs commented 2 years ago

Thanks @charlesfrye for the very useful feedback! We are definitely planning on bringing it back, but most likely using our own implementation so that we have more control over it. Would you be able to tell us which parts of the editor were most useful for you? Blurring / cropping / coloring / etc.?

hysts commented 2 years ago

I also found the image editing function very useful to modify input images to check the robustness of models, and I'm glad to hear there's a plan to bring it back.

In my case, rotating, flipping, blurring, cropping, changing aspect ratios, and adding noise were useful for checking the performance of object detection models, image classification models, etc. Drawing tools were also useful for partially or completely occluding objects in images.

Some of the features that were missing in the previous image editor and that I wanted are discussed in the following issues. https://github.com/gradio-app/gradio/issues/1020 https://github.com/gradio-app/gradio/issues/1410 A little while ago, I was thinking of making an app for MatteFormer, but image matting task requires three-colored mask to specify foreground, background, and ambiguous areas, and it was impossible to create such masks even with gradio v2 image editor, so I decided not to. Also, I recently made an app for Text2Human, and it would be better if we could edit label images directly. It's possible with the GUI app in the original repo, but it's not with image editor with gradio.

charlesfrye commented 2 years ago

@abidlabs Happy to help! The most useful transformations were adding noise, blurring, adding text, and erasing/drawing.

Adding noise and blurring are nice generic robustness tests, but they are relatively easy to do in a library like torchvision. Erasing, drawing, and adding text, on the other hand, are much harder to automate and so aren't as readily available in existing modeling libraries.

For "gradio as a tool for exploring models", I think it is generally the case that those more interactive editing tools would be highest value-add.

Rotations and flips were less useful, but that may be specific to the use case I spent the most time with -- the OCR model expected text to be mostly oriented correctly.

pngwn commented 1 year ago

I have updated the parent issue to try to capture the various requests we have had and start a conversation about how we design this. Please take a look and provide any feedback, it would be very much appreciated!

abidlabs commented 1 year ago

This looks great @pngwn and definitely captures the vast majority of user feedback that I've heard. A few thoughts:

One thing that has crossed my mind is allowing multiple inputs, this would allow Gradio app authors to be very flexible with what kinds of Image sources are set which will work well for some more general models. Could be controllable via the GUI (defaulting to blank background but with buttons/ toggles to enable different source modes).

This is something we've heard a lot. In the Python API, if users pass in a list for the source parameter, it would be nice if the GUI allowed them to toggle between these options.

editor tools

LGTM. One additional request we've heard is the ability to type text onto an image. This is useful for OCR-type models. See @charlesfrye's comments in the thread above, for example.

pre and post-process

It seems that some users strongly prefer dealing with a single image, while others require separate layers for the image, mask, and sketch, I think we should actually provide this as an option that can be controlled via the Python API. The Image component could take in a parameter (something like collapse_layers), which if set to True, would return a single image to the backend function. If False, it would return separate a dictionary with separate images for the keys image, mask, and sketch.

I didn't follow what you meant about the "example of three masks"

pngwn commented 1 year ago

@abidlabs regarding the three masks, it is an example pulled from this comment:

A little while ago, I was thinking of making an app for MatteFormer, but image matting task requires three-colored mask to specify foreground, background, and ambiguous areas, and it was impossible to create such masks even with gradio v2 image editor, so I decided not to.

collapse_layers kwarg, sounds like a good idea.

I'll add text to the feature list.

GalaxyTimeMachine commented 1 year ago

I came here to add a request for a mask eraser. It's sometimes a pain to have to reverse and delete the whole mask when you only want to be able to erase a small part of it.

johko commented 1 year ago

Hi, I would love to see the possibility to have an example for a masked image input in a space. Even just being able to put in an empty mask would already help in my opinion, as mostly the really important thing for examples is to have an image to start from without having to upload anything.

Starhkz commented 1 year ago

I recently started using gradio, and it's been really helpful. My major challenge is reducing the brush size. Fortunately the issue was mentioned earlier. Are there any updates on any of these?

  • sketching

    • brush size
    • brush colour - Gradio API: open, locked to a specific colour, locked to a set of colours
    • brush texture (???)
pngwn commented 1 year ago

https://github.com/gradio-app/gradio/issues/2903

Alchete commented 1 year ago

@pngwn I'm strongly in favor of improving this component as, whether intended or not, it's become the de-facto interface for Stable Diffusion and folks are currently zooming their browser windows to see what they're masking. BTW, I'd also recommend looking at InvokeAI's implementation of its unified canvas feature.

Even just adding shortcuts and a functioning zoom to the existing Image component would go a long way toward filling the gap short term. Since I come from the Desktop UI world, can you or someone explain if the Image component currently supports "focus" and "keyboard" listeners? And if not, which package one might use to support those features that would be acceptable on the Gradio-side? I'd be willing to tinker with this on my own. Many thanks.

anapnoe commented 1 year ago

I don't know if here is the right place to ask why the image editing component uses 5 canvases instead of one ? it seems very inefficient maybe someone can explain to me why we need to allocate 4x times the memory which is not free for large canvases this component is very useful it should be optimized first before anything else is added on top [x] remove unnecessary canvases [x] proper undo redo option attribute field to constrain memory footprint (history)

then any tool adds a cherry on the pi 😎

pngwn commented 1 year ago

@anapnoe there are reasons but they aren't particularly good ones. This will be addressed in the rewrite. The performance of the current sketch tool is quite poor currently, especially with large images.

missionfloyd commented 1 year ago

How about a None tool option? Sometimes we just need to upload images.

cceyda commented 1 year ago

On windows using chrome I can drag&drop an image from another tab/window into gradio. But on mac this doesn't work. I really like doing this for quick testing of things, like I can search for cat pics on one tab and drag drop the images to see if my animal classifier works.

henryruhs commented 1 year ago

When I set tool = None I actual don't want to have any tool. Who thought having tools on by default is a good idea might explain this too me.

This component is a good example of feature creep. I highly recommend to break it into smaller components such as ImageEditor, SketchPad, WebcamShot and Image.

Options that turns a component into a totally different thing results in a bad API and from my experience bad codebase with tons of paths and conditions.