Ability to load local models

shimizust commented 9 months ago

Hi, thanks for making this project available!

I was wondering if it is possible to point directly to local models? Instead of downloading from a URL or HF Hub?

gadicc commented 9 months ago

Hey! Sure, thanks.

Uh in theory this should be pretty easy but I've never tried it personally :sweat_smile: Don't forget that if you're deploying your model somewhere else you'll have to include the model in your Docker build.

So, I think it should be this simple:

Set RUNTIME_DOWNLOADS=0
Set MODEL_ID to the directory containing your local model (in diffusers format).
Set MODEL_PRECISION="fp16" if relevant.

That's assuming you only want one model loaded per run. If you want to be able to switch models at runtime, you can just pass MODEL_ID (to the directory containing the local model) as part of your request. You may need to also set RUNTIME_DOWNLOADS=1 but first try without that.

Hope that helps! Let me know either way.

shimizust commented 9 months ago

Thanks for the tips @gadicc !

I tried this using https://huggingface.co/CompVis/stable-diffusion-v1-4 in a volume mounted to the container. I think it's loading the model fine, but I'm getting an error during inference. Any ideas what I'm doing wrong?

data = {
    "modelInputs": {
        "prompt": "Super dog",
        "num_inference_steps": 5
    },
    "callInputs": {
        "MODEL_ID": "/shared/user/sshimizu/stable-diffusion-v1-4",
        "PIPELINE": "StableDiffusionPipeline",
        "SCHEDULER": "DPMSolverMultistepScheduler",
        "RUNTIME_DOWNLOADS": 0,
        "MODEL_PRECISION": "fp16",
        "safety_checker": "true",
    },
}

json_data = json.dumps(data)
response = requests.post(url, json=data)

I get this error:

{"$error":{"code":"PIPELINE_ERROR","name":"TypeError","message":"unsupported operand type(s) for %: 'int' and 'NoneType'","stack":"Traceback (most recent call last):\n  File \"\/api\/app.py\", line 638, in inference\n    images = (await async_pipeline).images\n  File \"\/opt\/conda\/lib\/python3.10\/asyncio\/threads.py\", line 25, in to_thread\n    return await loop.run_in_executor(None, func_call)\n  File \"\/opt\/conda\/lib\/python3.10\/concurrent\/futures\/thread.py\", line 58, in run\n    result = self.fn(*self.args, **self.kwargs)\n  File \"\/opt\/conda\/lib\/python3.10\/site-packages\/torch\/utils\/_contextlib.py\", line 115, in decorate_context\n    return func(*args, **kwargs)\n  File \"\/opt\/conda\/lib\/python3.10\/site-packages\/diffusers\/pipelines\/stable_diffusion\/pipeline_stable_diffusion.py\", line 1062, in __call__\n    if callback is not None and i % callback_steps == 0:\nTypeError: unsupported operand type(s) for %: 'int' and 'NoneType'\n"}}

And here are the pod logs:

[2024-02-13 01:29:10 +0000] - (sanic.access)[INFO][127.0.0.1:36822]: POST http://localhost:8000/  200 975
{
  "modelInputs": {
    "prompt": "Super dog",
    "num_inference_steps": 5
  },
  "callInputs": {
    "MODEL_ID": "/shared/user/sshimizu/stable-diffusion-v1-4",
    "PIPELINE": "StableDiffusionPipeline",
    "SCHEDULER": "DPMSolverMultistepScheduler",
    "RUNTIME_DOWNLOADS": 0,
    "MODEL_PRECISION": "FP16",
    "safety_checker": "true"
  }
}
download_model {'model_url': None, 'model_id': '/shared/user/sshimizu/stable-diffusion-v1-4', 'model_revision': None, 'hf_model_id': None, 'checkpoint_url': None, 'checkpoint_config_url': None}
loadModel {'model_id': '/shared/user/sshimizu/stable-diffusion-v1-4', 'load': False, 'precision': 'FP16', 'revision': None, 'pipeline_class': <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'>}
pipeline <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'>
Downloading model: /shared/user/sshimizu/stable-diffusion-v1-4
Keyword arguments {'use_auth_token': None} are not expected by StableDiffusionPipeline and will be ignored.
Loading pipeline components...:  57%|█████▋    | 4/7 [00:00<00:00,  7.00it/s]`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00,  7.70it/s]
Downloaded in 913 ms
2024-02-13 06:24:09.407129 {'type': 'loadModel', 'status': 'start', 'container_id': 'inference-server', 'time': 1707805449407, 't': 1511, 'tsl': 0, 'payload': {'startRequestId': None}}
loadModel {'model_id': '/shared/user/sshimizu/stable-diffusion-v1-4', 'load': True, 'precision': 'FP16', 'revision': None, 'pipeline_class': <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'>}
pipeline <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'>
Loading model: /shared/user/sshimizu/stable-diffusion-v1-4
Keyword arguments {'use_auth_token': None} are not expected by StableDiffusionPipeline and will be ignored.
Loading pipeline components...:  43%|████▎     | 3/7 [00:00<00:00, 10.72it/s]`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00,  8.13it/s]
Loaded from disk in 865 ms, to gpu in 811 ms
2024-02-13 06:24:11.083625 {'type': 'loadModel', 'status': 'done', 'container_id': 'inference-server', 'time': 1707805451084, 't': 3188, 'tsl': 1677, 'payload': {'startRequestId': None}}
Initialized StableDiffusionPipeline for /shared/user/sshimizu/stable-diffusion-v1-4 in 1ms
{'cross_attention_kwargs': {}}
2024-02-13 06:24:11.121568 {'type': 'inference', 'status': 'start', 'container_id': 'inference-server', 'time': 1707805451122, 't': 3226, 'tsl': 0, 'payload': {'startRequestId': None}}
{'callback': <function inference.<locals>.callback at 0x77dc81dfa830>, '**model_inputs': {'prompt': 'Super dog', 'num_inference_steps': 5, 'generator': <torch._C.Generator object at 0x77dc827abf50>}}
 20%|██        | 1/5 [00:00<00:00, 12.56it/s]
[2024-02-13 06:24:11 +0000] - (sanic.access)[INFO][127.0.0.1:45132]: POST http://localhost:8000/  200 975

gadicc commented 9 months ago

Hey @shimizust

Looks like a bug... maybe because upstream diffusers removed a default, otherwise I'm not sure why we never saw this before.

Let me first explain the [most relevant lines of the] error and then the fix. You don't need to know or understand any of this, and feel free to skip if it's not of interest

Line: if callback is not None and i % callback_steps == 0 Error: unsupported operand type(s) for %: 'int' and 'NoneType'

So it's trying to calculate "x % y" (modulo operation, i.e. if we divide x by y, what will the remainder be?). Obviously this requires us to divide to numbers (integers), but in this case, it's warning that the second argument (callback_steps) is not a number, it's a NoneType (i.e. doesn't exist), and this is why we get the error.

Now as to what leads this error is a bit more complicated. In docker-diffusers-api, we automatically set a callback (to be run on every callback_steps) if none is provided. This used to work fine, but I guess now diffusers is expecting callback_steps to be explicitly given if callback is too).

So, the workaround (until I push a proper fix) is to provide a modelInput called callback_steps with an integer value, e.g.

{
   "moduleInputs": {
    // ...
    "callback_steps": 20
  }, // ...
}

This just controls how often we report back the current progress via webhook... if it's irrelevant for your application just use a number higher than your num_inference_steps.

Two other things I noticed (unrelated):

You included "RUNTIME_DOWNLOADS": 0 but this is something that's only recognized via an environment variable, not as part of the request.
You have safety_checker: "true" but this should be a boolean and not a string, i.e. True not "true". Not really sure how this will affect things but just to avoid any problems further down `:)

Good luck!

shimizust commented 9 months ago

Thanks @gadicc ! Specifying "callback_steps" to some int in "modelInputs" works and I'm able to generate images from my local model now.

I guess setting RUNTIME_DOWNLOADS isn't strictly necessary then.

kiri-art / docker-diffusers-api

Ability to load local models #39