thygate / stable-diffusion-webui-depthmap-script

High Resolution Depth Maps for Stable Diffusion WebUI
MIT License
1.72k stars 159 forks source link

Adding explicit api #296

Closed graemeniedermayer closed 1 year ago

graemeniedermayer commented 1 year ago

Definitely very bare bones. Some autogenerated docs are available from http://127.0.0.1:7860/docs (or wherever auto1111 is hosted).

updates to readme are necessary but it's probably better to write after api is set in stone.

I was debating a lot about whether "/depth/" route might cause conflicts with other extensions in the future. It's not clear to me that this would be safe for public hosting. I haven't done any extra data sanitation or any security measures.

It might also be good to have a more stable base api, then have experimental features be more fluid. There's future things like 3d models from this api that would be very exciting.

Notes get_defaults works but there's likely a better method (I don't like using internal dictionaries)

def get_defaults():
    default_gradio = main_ui_panel(True).internal
    defaults = {}
    for key, value in default_gradio.items(): 
        defaults[key]= value.value
    return defaults
thygate commented 1 year ago

I don't really mind grabbing the defaults this way, it is like you said robust against changes..

For the /depth route, I would prefer keeping it but i'm not aware on any issues that might cause.

@semjon00 agreed, enum seems like the best option.

Thanks for this @graemeniedermayer , I tried a few times to get it working in the past but never did.

semjon00 commented 1 year ago

@graemeniedermayer Thank you! This is cool, I definitely see how this could be used. @thygate Ooooh, I think it is kinda early for merging it. Nvm, we could fix it after this is merged, because nothing depends on it yet, AFAIK. But this needs some more polish to avoid having issues in the future.

thygate commented 1 year ago

I got a bit too exited there, my bad. I've been wanting to have a direct api for depth for a while ..

Working great!

graemeniedermayer commented 1 year ago

But this needs some more polish to avoid having issues in the future.

Yah I definitely think this is the case. For large numbers of requests, I'm not sure if get_defaults would cause a small memory leak because the gradio components aren't cleared.

It shouldn't have any side effects.

semjon00 commented 1 year ago

@graemeniedermayer FYI current main has a rework of funnel inputs. I think this raises ceiling for potential API code/functionality quality. The funnel defaults to default parameter value if no value for a parameter was supplied. Lol that was mouthful 😆

graemeniedermayer commented 1 year ago

Awesome! This is excellent for the API!

This should also be helpful for having a print state function for debugging or load/save state functionality. I don't know either are important right now but it means future proofing!

semjon00 commented 1 year ago

@graemeniedermayer I have gotten all-in with the new round of funnel refactor - please see the next branch. I think the API no longer generates good automatic docs perhaps? IMHO this is a worth sacrifice, but maybe some tweaking is needed. The code had two parts which were basically full copies of eachother - removed one of them.

Actually, I did not test the API changes in the next branch, maybe the API does not even work (: Could you tell me how can I use it? Example command/code could be great :)

Plus - could you please try to make API work in standalone mode, too? That would be extra cool - feel free to reasonably extend backbone.py if needed.

graemeniedermayer commented 1 year ago

Yah I can get on that!

I was expecting a big break, plus getting it working with the standalone might require breaking changes too.

I think the API no longer generates good automatic docs perhaps?

It would probably be good to have some simplified aliasing that calls generate with reduced specific parameters. The option list is a little opaque. I really don't like string lists without documentation in apis. So "model_type" should be documented without having to look at source code.

The thing that's nice about keeping some form of override strategy inside of the api is it allows you to make internal changes to the codebase without impacted the api (you can reroute by change dictionary keys). So if source code change width to net_width that could break someone else api calls but with overriding you can reroute width to net_width. Of course you don't want to let the rerouting build up too much (eventual api breaks happen), but it does help decouple the api from internal changes.

Here's a test example with the api (This might not work with python3.9+)

import requests
from io import BytesIO
import base64
from PIL import Image
import json

with open("test_img.png", "rb") as image_file:
        img = base64.b64encode(image_file.read())

dics = {
  "depth_input_images": [img],
  "compute_device": "GPU",
  "model_type": "zoedepth_n (indoor)",
  "net_width": 512,
  "net_height": 512,
  "match_size": False,
  "boost": False,
  "invert_depth": False
}

# this points to auto1111. this is usually the default local address.
url = 'http://127.0.0.1:7860/depth/process'

#This is where the request is made
x = requests.post(url, json = dics)
print(x.text)
img = json.loads(x.text)['images'][0]
print(img)
#Image is decoded here. this might have changed in python 3.9
img_io = BytesIO(base64.b64decode(img))
im = Image.open(img_io)

#Image is
im.save('depth_img.png', 'PNG')
semjon00 commented 1 year ago

We don't advertise API yet and it is less than a week old - I think we could break it no issue. I think the current next has future-proof values (except for model selection, horizontal/vertical stacking and a couple of rembg values - where marked Legacy in the common_constants). Rerouting can be done with the approach that is in the next branch, too, I think. I would not expect crazy rerouting with such values.

But yeah - hard-to-explain dictionaries are not optimal. What if we could make every value of this dictionary an optional setting? But it is crucial to not violate don't-repeat-yourself principle - not a single copy of a list with 40+ options should exist. It is better to have 50 lines of tricky python code with * and ** operators, classes that extend Enums, etc., than a 40-line block with a copy of values from somewhere else, because these values are bound to change. The enum in common_constants could be modified to include option explanation, too. Then these explanations could be set get the values from the same strings for both the UI and the API.

Thank you for the example! will test soon.

graemeniedermayer commented 1 year ago

What if we could make every value of this dictionary an optional setting? But it is crucial to not violate don't-repeat-yourself principle

For APIs it's important to apply this to the clientside too. Everyone who writes a request tends to need to repeat themselves which means the basic requests should be really simple and obvious. Although I also do want most of the options to be accessible for people want to delve in.

It feels like we should just be able to dump the gradio interface information automatically at runtime somewhere (like a config file) because it has most of the relevant information. That way if the interface changes the api docs could be updated automatically. I'll have to look into that though because it might be complicated.

Mostly, I don't want the API breaking in every few updates. The api just feels tightly coupled to internals right now. Maybe we just need tests.

semjon00 commented 1 year ago

The simplest request consists of images and an empty dictionary, and it will output depthmaps - simple :)

Can't really figure out a way to abstract options more. After @thygate gives a word regarding the new input convention and we finalize it, I 1) will not break it (except where I already mentioned) and 2) will wait until you verify that there is no breakage in the API, if I change it.

Pinky promise.

It feels like we should just be able to dump the gradio interface information automatically at runtime somewhere (like a config file) because it has most of the relevant information.

No. Nothing should ever depend on WebUI UI. If you'd like to reuse the option explanations from the UI, please refactor them into common_constants.py and then use these from there. If you'd like to give a detailed explanations to the options, the best way would be to create a detailed explanation for the wiki. We need help with it 😉😅

graemeniedermayer commented 1 year ago

No. Nothing should ever depend on WebUI UI.

What's the philosophy behind this? An Api is an Application Programming Interface so it does make some sense for it to be dependent on the User Interface. They do have a similar responsibility. This of course would lead to a less stable API but it could be auto documented. To me nothing should be dependent on the API (except maybe automated tests).

While having a wiki entry is great, there has been a tendency for other wikis to not be updated in a timely fashion.

I don't really want common_constants.py to become too bloated. I picture API doc changes happening often whereas I don't think we would want common_constants.py to change often.

Haha don't make any pinky promises yet! I'm pretty available right now, but in the future you'll probably need to push without me.

semjon00 commented 1 year ago

I don't really want common_constants.py to become too bloated. I picture API doc changes happening often whereas I don't think we would want common_constants.py to change often.

Good point. Wiki it is then. The API won't be changed much (except for the new features), so no issue if these are not 100% up to date at all times. We need the wiki for the UI anyways, so makes sense to have 1 wiki for all the interfaces. Explaining what an option does is much more info compared to outlining the name and possible values for the option in the API.

Sometimes it makes sense to expose something trough UI that should not be available trough API, and vice versa - for example, currently meshes should not be generated via API. The API should not depend on Gradio - this prevents API from being run headlessly and makes it inherit bugs from the Gradio instance running on the user machine. Running headlessly is important for like a hundred (sometimes) non-obvious things - like faster tests, less bloat in the minimal installation, servers architecture with dozens of clients... And yes, depending one interface on another makes the top one (dependee) change frequently, whereas the bottom one is harder to change without breaking/changing the top one - and sometimes impossible. UI is expected to change faster than the API.

There is not much documentation in the UI. There should not be. Depending on it simply is not worth it.

semjon00 commented 1 year ago

Auto-generated docs from the UI is an interesting idea, but not so useful here IMHO

graemeniedermayer commented 1 year ago

There is not much documentation in the UI.

It is just the choices. I strongly dislike when apis have secret passwords like 'dpt_beit_large_512 (midas 3.1)'.

The enum in common_constants could be modified to include option explanation, too. Then these explanations could be set get the values from the same strings for both the UI and the API.

Rereading everything slowly, maybe this is the solution. Considering how standalone and extension might eventually have different very gradio interfaces. I'll abandon thinking about crazy ways to automatically steal the UI.


The current api does feel like it violated Postel's law/Robustness principle in a number of ways. A large number of requests would crash the server because the options are fed directly into the generation funnel. I'm not even sure there would be a "500 response". Also sending back an unfiltered unstructured image list is non-ideal (especially if they aren't all the same image type).

There does need to be some way to change the api defaults to boost=False (very slow) and net_size_match=True (Although this should be overridden to False if width and height are provided in request). I really don't want everyone to need to include those in every requests.

semjon00 commented 1 year ago

We are changing boost default to False anyways. We can account for Postel's law in the api_depthmap code. Just make sure to not make more endpoints than absolutely necessary. 'dpt_beit_large_512 (midas 3.1)' will not be an option, we will refactor model type as an enum. Wanted to do that once, but got nowhere and deleted the failed attempt. Feel free to change how outputs are sent - explicitly specifying the type and corresponding input image number. Maybe even return asynchronously - this is now supported by the funnel - that would be nice.