thygate / stable-diffusion-webui-depthmap-script

High Resolution Depth Maps for Stable Diffusion WebUI
MIT License
1.73k stars 159 forks source link

Create videos from the API #378

Closed davidmartinrius closed 11 months ago

davidmartinrius commented 12 months ago

Hello!

How are you doing? 😊 My name is David Martin Rius and I added a new functionality to create a video from an image with the API.

It does not affect the current extension behaviour. It is just an enhancement.

The endpoint is the same, /depth/generate I added a new option "RUN_MAKEVIDEO_API" in common_constants.py (available when calling /depth/get_options)

If this code is ok I will create a new endpoint in https://github.com/mix1009/sdwebuiapi to integrate it. If you don't know what is this other project, it is by now the most advanced and open source API for automatic1111.

This object "video_parameters" has this properties:

        'mesh_fi_filename': "/your/stablediffusionui/automatic1111/stable-diffusion-webui/outputs/extras-images/depthmap-0026.obj",
        "vid_numframes": 300,
        "vid_fps": 40,
        "vid_traj": 1,
        "vid_shift": "-0.015, 0.0, -0.05",
        "vid_border": "0.03, 0.03, 0.05, 0.03",
        "dolly": False,
        "vid_format": "mp4", #vid_format and output_filename extension must match
        "vid_ssaa": 3,
        "output_filename": "/your/desired/output/path/filename.mp4"

Important:

  1. By now the output filename will have an extra underscore. For example if passed /my/folder/video.mp4 the ouput will be /my/folder/video.mp4_

  2. _The property "mesh_fifilename" is optional, Is not needed. Also can be None or a path of a .obj file. If you already have an .obj file it will create a video much faster. Creating a mesh is the lowest part of the process, so I recommend using it if you are rendering multiple videos.

On the other hand, if any of this properties is required and not passed is controlled by an exception inside run_makevideo_api function and will tell you what is missing. The function run_makevideo_api can be found in scr/core.py

You can use this snippet example to create a video from the API:

import requests
import base64
from PIL import Image
import json

image_path = "/path/to/your/image.jpg"
image = Image.open(image_path)

available_models = {
    'dpt_beit_large_512': 1, #midas 3.1
    'dpt_beit_large_384': 2, #midas 3.1
    'dpt_large_384': 3, #midas 3.0
    'dpt_hybrid_384': 4, #midas 3.0
    'midas_v21': 5,
    'midas_v21_small': 6,
    'zoedepth_n': 7, #indoor
    'zoedepth_k': 8, #outdoor
    'zoedepth_nk': 9,
}

if __name__ == '__main__':
    with open(image_path, "rb") as image_file:
        img = base64.b64encode(image_file.read()).decode()
    url = 'http://127.0.0.1:7860/depth/generate'
    dics = {
        "depth_input_images": [img],
        "options": {
            "compute_device": "GPU",
            "boost": True,
            "model_type": available_models['midas_v21'], #can be an integer or a string
            "video_parameters": {
                # The property "mesh_fi_filename" is optional, Is not needed. Also can be None or a path of a .obj file.
                # If you already have an .obj file it will create a video much faster. Creating a mesh is the lowest part of the process, so I recommend using it if you are rendering multiple videos.
                #'mesh_fi_filename': None, #optional
                'mesh_fi_filename': "/your/stablediffusionui/automatic1111/stable-diffusion-webui/outputs/extras-images/depthmap-0026.obj",
                "vid_numframes": 300,
                "vid_fps": 40,
                "vid_traj": 1,
                "vid_shift": "-0.015, 0.0, -0.05",
                "vid_border": "0.03, 0.03, 0.05, 0.03",
                "dolly": False,
                "vid_format": "mp4", #vid_format and output_filename extension must match
                "vid_ssaa": 3,
                "output_filename": "/your/desired/output/path/filename.mp4"
            }
        }
    }

    x = requests.post(url, json=dics)
    response = json.loads(x.text)

Thank you! Enjoy it! 😊

David Martin Rius

semjon00 commented 12 months ago

Hello! Overall, quite good! However, for code extendability, a slightly different approach is required. I try to keep the code_generation_funnel as separated from the API as possible. I will give some specific things to look for using code review. This functionality is a great addition, I like that it does not break anything.

Also a question: is it possible to later make the code support strings for model selection? Like, could we have a parameter that could be either int or a string? If not, do you think it would be a reasonable thing to do, to have it as string from the beginning and later add support for model names? Not sure if anybody wants to add it (would certainly be welcome), but just would be nice to work around this potential backward-compatibility issue.

davidmartinrius commented 12 months ago

Well, if you wanted I can create a specific endpoint for this functionality like /depth/generate/video and do not use the code_generation_funnel, but implement a separated function for that endpoint. Or better, if you could explain what exactly would you want I could try to program it.

About this question "is it possible to later make the code support strings for model selection? "

I do not understand what do you refer. Could you explain with more details and examples, please? πŸ€”

Thank you!

davidmartinrius commented 12 months ago

Ah, I suppose you meant this: available_models = { 'dpt_beit_large_512': 1, #midas 3.1 'dpt_beit_large_384': 2, #midas 3.1 'dpt_large_384': 3, #midas 3.0 'dpt_hybrid_384': 4, #midas 3.0 'midas_v21': 5, 'midas_v21_small': 6, 'zoedepth_n': 7, #indoor 'zoedepth_k': 8, #outdoor 'zoedepth_nk': 9, }

To be available inside the API and pass a string of the model name or pass an integer that matches that model, it that right?

Like "model_type": "dpt_beit_large_512" OR "model_type": 1

?

semjon00 commented 12 months ago

So basically depthmap_api.py should "wrap around" the core and ask it to do stuff, and then transform this stuff for API-specific needs. But in this state of MR, its leaking a bit - core is "aware" of the API stuff happening.

semjon00 commented 12 months ago

Exactly. Sorry, did not sleep well tonight πŸ₯Ά

semjon00 commented 12 months ago

| if you wanted I can create a specific endpoint for this functionality like /depth/generate/video Not sure... Full disclosure: i touched most of the original code in this project, but it was some time ago... Can't be very sure of everything right now, need to think a bit. Looking for maintainers :)

davidmartinrius commented 12 months ago

So basically depthmap_api.py should "wrap around" the core and ask it to do stuff, and then transform this stuff for API-specific needs. But in this state of MR, its leaking a bit - core is "aware" of the API stuff happening.

Yes, I am aware of that. I just followed the order of what was already programmed. It is clear that the API should wrap around the core and not be mixed into it.

Although I don't know the project enough to make such big changes. So I continued in the order in which things were already done.

I think restructuring the API and refactoring it requires another separate pull request and I don't know if I would be able to do it without very clear instructions on how you would want it.

semjon00 commented 12 months ago

I feel the same way, that a big task. Then just please try to make the core_generation_funnel not call the API specific code and then I think we can call it a day. Parameter names, optimal design decisions, et cetera can wait I suppose.

davidmartinrius commented 12 months ago

| if you wanted I can create a specific endpoint for this functionality like /depth/generate/video Not sure... Full disclosure: i touched most of the original code in this project, but it was some time ago... Can't be very sure of everything right now, need to think a bit. Looking for maintainers :)

Absoulutely! :D You have a much better vision than me. If you want to do it, it will be appreciated.

davidmartinrius commented 12 months ago

I have tried to separate the code from lines 345 to 353 into a function outside of the core_generation_funnel. Although I haven't found a clean way to do it. In any case, I need variables set in the core_funnel to process the video later. So, the only way I found is to yield that variables instead of calling run_makevideo_api. And in another part of the code call run_makevideo_api. But I think is not a good approach..

Please, could you suggest me to abstract that part into another function, so that the core doesn't get mixed up with the API tasks?

Thank you!

graemeniedermayer commented 12 months ago

You might be able to use a similar method to run_makevideo in common_ui.py. Maybe you could use the if line 336 inp[go.GEN_INPAINTED_MESH]: to generate a mesh from the core funnel and then run_makevideo afterwards in the API code. It does feel like inpainted mesh generation should be extracted into a separate function core_generation_funnel (but this seems like a separate task). Is that the key issue, the inpainted mesh generation requiring variables from the core_funnel?

It does also feel like the second part of run_makevideo_api could call run_makevideo directly to reduce code repetition. I might be missing the difference.

Great work!

davidmartinrius commented 11 months ago

Ok, I'll do it this way. As soon as programmed I'll let you know. Thanks!

davidmartinrius commented 11 months ago

I made several changes.

  1. The model_type can be an integer or a string. The API will manage it.
  2. I removed the extra code from the core_generation_funnel
  3. The function run_makevideo_api has been removed, there is only runmakevideo and now it can receive 2 extra nullable parameters: outpath and basename. So, when an user call the api can pass the desired output folder and file name. (It adds an extra underscore that I can't control without making too much changes. For example if I pass /my/folder/video.mp4 the ouput will be /my/folder/video.mp4) This is because /inpaint/mesh.py output_3d_photo adds an extra underscore. I prefered not to touch anything else because it would increase the complexity.
  4. I updated the first message in this thread I changed several parameters. So users can copy paste the code and will work.
  5. I only call core_generation_funnel to generate the mesh

So, the core is not mixed with the api anymore when generating videos from the api.

What do you think about this changes?

Thank you!

David Martin Rius

semjon00 commented 11 months ago

Hello again :)

I did not really get in to all the details of the code, but overall (at least architecturally), it looks very, very good πŸ‘ I will give it some more attention and then merge.

semjon00 commented 11 months ago

Indeed, very good code! Merging with next to no modifications. There are some security risks from exposing this functionality in API, but since we never advertised the API as something that can be made accessible from the internet, this is ok.

Thank you so much for contributing this code ❀️ I am happy that now this project can be more useful for people. I would be glad to collaborate with you more, choose you to create more code for this project 😊 Feel free to let me know if something does not work right.

davidmartinrius commented 11 months ago

Indeed, very good code! Merging with next to no modifications. There are some security risks from exposing this functionality in API, but since we never advertised the API as something that can be made accessible from the internet, this is ok.

Thank you so much for contributing this code ❀️ I am happy that now this project can be more useful for people. I would be glad to collaborate with you more, choose you to create more code for this project 😊 Feel free to let me know if something does not work right.

Hi @semjon00 !!

I am very glad to contribute. Thank you so much for merging the code.

  1. Now I will create a new endpoint to https://github.com/mix1009/sdwebuiapi in this way, this plugin will be easier to use via API for any user. (For both endpoints available in the API)

  2. On the other hand, I am trying to make the mesh generation to work with pytorch instead numpy to accelerate the mesh generation process. Actually some mesh generation steps already use pytorch, but not in a optimal way and do not use pytorch everywhere. (I refer to the code inside inpaint/mesh.py, inpaint/mesh_tools.py, inpaint/bilateral_filtering.py, etc) And because of that the mesh generation takes too much time. It uses the cpu because of numpy. So, my idea is to migrate to pytorch tensors. Is not an easy task and it requires to modify the code in blocks and very carefully. Maybe there are other reasons that affect to the performance, but I still do not know.

Do you know the critical points when generating a 3D mesh? I mean that ones that specially slow down the process and could use the gpu to speed up the process.

If you had to do it where would you start?

Thank you!

David Martin Rius

semjon00 commented 11 months ago

If you had to do it where would you start?

Honestly, I am not even exactly sure what magic happens over there. @thygate is the person who added this into the script - I think he borrowed this code from somewhere else - he knows how it works better than me. The proper way to change the code would be to find the upstream and contribute there (granted, hopefully that repository is still maintained), and then tweak this repository to use the newest upstream version.

davidmartinrius commented 11 months ago

I think that code may come from https://github.com/vt-vl-lab/3d-photo-inpainting and also facebookresearch. Unfortunately, the last updates are from 3 years ago. So, this is the most updated repository altough it is almost a copy&paste of that project if I am not wrong.

Maybe @thygate has an idea on how to do it.

Thank you!

semjon00 commented 11 months ago

Right, I remember now. Indeed, then the most reasonable thing to do is to just patch it here. After the fact I/we might try to find other "forks" and MR the changes there.