vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.52k stars 402 forks source link

[Issue]: Model selector loading "for ever" on mobile device #480

Closed cibernicola closed 1 year ago

cibernicola commented 1 year ago

Issue Description

I am not clear about the casuistry to reproduce the problem, but it has already happened to me several times:

When changing model via selector. This one stays in "loading" status indefinitely. After X time everything works, although the load continues, which disables the option to change model :S

image

Version Platform Description

commit: 98adfb31 Windows 10 Python 3.10 RTX 3090 Chrome/Firefox/Vivaldi

vladmandic commented 1 year ago

when this happens, can you post part of the console output that shows model loading? last 5-10 lines of the output should be enough.

myndxero commented 1 year ago

I get this a lot too, I get a lot of weird issues, I thought it was just me but I keep seeing others with issues I am running into. Some of it seems to be corrected when I keep rolling transformers back to 4.19.2 from that other thread. I haven't re-enabled all the extensions either, just the main ones I use, not even LDSR and a couple other built-ins are on atm.

PS - I love that feature btw, being able to disable extensions without removing them. Just need to find where to change that via text file should something cause it never to launch.

EDIT - I've noticed some correlation with it occurring if something fails to generate, maybe possible memory related, as I switch back and forth between two projects - one on my phone and the other on PC. Just a semi-laymen's guess.

vladmandic commented 1 year ago

I love that feature btw, being able to disable extensions without removing them. Just need to find where to change that via text file should something cause it never to launch.

config.json -> disabled_extensions

kaisonto commented 1 year ago

@cibernicola how many models do you have? I had the same issue, after I moved some models to my backup drive it seemed to resolve the issue. I think the problem is correlated to having too many models to load, I haven't worked out what the limit is, I had over 200. I managed to curate down to below 100.

vladmandic commented 1 year ago

model load and model list are two different things. this issue was created for model loading. my best guess what that either storage slow and/or that was the first time specific model was requested, so code tried to calculate sha256 hash for it (its stored so it doesn't have to be recalculated each time). but gradio ui controls have a built-in timeout which is not configurable, so even when model did eventually get loaded, ui control still showed up as inconsistent.

mart-hill commented 1 year ago

I have the same issue, but I use obraz button after starting the UI, could that be it? Then I generated some images, and also switched back and forth the hires fix upscaling option from normal to "legacy" (the recent commits disabled the "legacy" option, so I re-enabled it for me) After the selection of the model, the shell doesn't even show that I selected a model to load. Refresh model list button works, despite the list being unavailable. obraz

Loading model from cache: control_v11p_sd15_softedge [a8575a2a]
Loading preprocessor: pidinet_safe
Pixel Perfect Mode Enabled.
resize_mode = ResizeMode.OUTER_FIT
raw_H = 1800
raw_W = 1800
target_H = 816
target_W = 512
estimation = 512.0
preprocessor resolution = 512
100%|██████████████████████████████████████████████████████████████████████████████| 40/40 [00:08<00:00,  4.63it/s]
100%|██████████████████████████████████████████████████████████████████████████████| 40/40 [00:33<00:00,  1.20it/s]
Available models: X:\AI\automatic\models\Stable-diffusion 703
Available models: X:\AI\automatic\models\Stable-diffusion 703

I pressed "refresh" twice while the list was "locked" from any further use. After generating the image I decided to change the model, but the shell doesn't reflect that. I still can generate an image, set options, etc., I just can't change a model anymore, unless I completely restart the UI.

vladmandic commented 1 year ago

I wish I could reproduce, would be so much simpler to debug & fix. But I have to say one thing - I really appreciate when someone describes entire workflow like this instead of saying "me too"!

mart-hill commented 1 year ago

Edit: Me too! :)

Jokes aside, it's practically a sure-case, if I load the UI, press "restore", generate some images, change LoRAs, or their weights in the prompt, change the prompt, ControlNet parameters a bit (with an image loaded), then try to change the model, about 50% of chance that I'll have a never-ending "refresh". :) For now, I'll use that "red" button under the Generate for choosing the checkpoints. Or, maybe, could it become a main way to choose a model in the future? 🙂

myndxero commented 1 year ago

I have the same issue, but I use obraz button after starting the UI, could that be it? Then I generated some images, and also switched back and forth the hires fix upscaling option from normal to "legacy" (the recent commits disabled the "legacy" option, so I re-enabled it for me) After the selection of the model, the shell doesn't even show that I selected a model to load. Refresh model list button works, despite the list being unavailable. obraz

Loading model from cache: control_v11p_sd15_softedge [a8575a2a]
Loading preprocessor: pidinet_safe
Pixel Perfect Mode Enabled.
resize_mode = ResizeMode.OUTER_FIT
raw_H = 1800
raw_W = 1800
target_H = 816
target_W = 512
estimation = 512.0
preprocessor resolution = 512
100%|██████████████████████████████████████████████████████████████████████████████| 40/40 [00:08<00:00,  4.63it/s]
100%|██████████████████████████████████████████████████████████████████████████████| 40/40 [00:33<00:00,  1.20it/s]
Available models: X:\AI\automatic\models\Stable-diffusion 703
Available models: X:\AI\automatic\models\Stable-diffusion 703

I pressed "refresh" twice while the list was "locked" from any further use. After generating the image I decided to change the model, but the shell doesn't reflect that. I still can generate an image, set options, etc., I just can't change a model anymore, unless I completely restart the UI.

You can cheat, start another instance and then change the model there and you're set in the first Ui, it might fix the disconnect and show the model too.

itsmeherefolks commented 1 year ago

I had the same problem when loading model f222 for the first time. In the output it can be seen that the weights for the model were not loaded. I did the following steps:

`Repo already cloned, using it as install directory ################################################################

################################################################ Create and activate python venv ################################################################

################################################################ Launching launch.py... ################################################################ 16:25:46-029827 INFO Python 3.10.10 on Linux
16:25:46-127764 INFO Version: da35bfb7 Wed Apr 26 08:08:24 2023 -0400
16:25:46-519788 INFO Latest published version: 93b0de7e599453027ad7cab6266b42920ebc1250 2023-04-26T13:02:32Z
16:25:46-521440 INFO Setting environment tuning
16:25:46-522834 INFO nVidia toolkit detected
16:25:49-025336 INFO Torch 2.0.0+cu118
16:25:49-058303 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8700
16:25:49-091313 INFO Torch detected GPU: NVIDIA GeForce RTX 3060 VRAM 12042 Arch (8, 6) Cores 28
16:25:49-095004 INFO Verifying requirements
16:25:49-133350 INFO No changes detected: Quick launch active
16:25:49-134947 INFO Running extension preloading
16:25:49-136277 INFO Server arguments: []
No module 'xformers'. Proceeding without it. Available models: /home/tom/automatic/models/Stable-diffusion 7 ControlNet v1.1.112 ControlNet v1.1.112 Loading theme: black-orange Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Initializing middleware Checkpoint model.ckpt not found; loading fallback f222.safetensors Loading weights: /home/tom/automatic/models/Stable-diffusion/f222.safetensors ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/4.3 GB -:--:-- Creating model from config: /home/tom/automatic/configs/v1-inference.yaml LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. Calculating sha256: /home/tom/automatic/models/Stable-diffusion/f222.safetensorsf300684443092d39cd717c92ae19836114960a560dabb887d2fca370e2cc2531 Applying scaled dot product cross attention optimization Embeddings loaded: (0) Model loaded in 23.8s (load=1.0s create=0.6s hash=15.8s apply=4.8s vae=0.9s move=0.5s embeddings=0.1s) Startup time: 36.8s (torch=5.2s gradio=1.9s libraries=1.2s codeformer=0.2s scripts=2.7s ui=1.5s start=0.2s scripts app_started_callback=0.1s checkpoint=23.8s) Available models: /home/tom/automatic/models/Stable-diffusion 7 Available models: /home/tom/automatic/models/Stable-diffusion 7 Progress 0.84it/s ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:11 Available models: /home/tom/automatic/models/Stable-diffusion 7 Available models: /home/tom/automatic/models/Stable-diffusion 7 Traceback (most recent call last): File "/home/tom/automatic/venv/lib/python3.10/site-packages/gradio/routes.py", line 394, in run_predict output = await app.get_blocks().process_api( File "/home/tom/automatic/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1075, in process_api result = await self.call_function( File "/home/tom/automatic/venv/lib/python3.10/site-packages/gradio/blocks.py", line 884, in call_function prediction = await anyio.to_thread.run_sync( File "/home/tom/automatic/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/tom/automatic/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future asyncio.exceptions.CancelledError WebSocket closed (ignore asyncio.exceptions.CancelledError) Server shutdown Server restart Available models: /home/tom/automatic/models/Stable-diffusion 7 ControlNet v1.1.112 Loading theme: black-orange Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Initializing middleware Startup time: 3.6s (scripts=1.9s ui=1.3s start=0.2s scripts app_started_callback=0.1s)`

JohanVlugt commented 1 year ago

I have the same problem, latest commit and running ubuntu 22.04.

This problem exist on my android/apple mobile devices with firefox and chrome also with desktop view. On the local machine or a laptop this does not happen.

Steps to reproduce:

1) Install with the webui.sh. 2) Load up SD webui on mobile device.

cibernicola commented 1 year ago

I think it is related to the image gallery plugin, sometimes it does not pass the selected image to txt2img or the selected destination.

The error in console is this:

Traceback (most recent call last):
  File "L:\automatic\venvibsite-packages-packages-gradio\routes.py", line 394, in run_predict.
    output = await app.get_blocks().process_api(
  File "L:\automatic\venvibsite-packages-packages-gradioblocks.py", line 1078, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "L:\automatic\venvibsite-packages-packages-gradio\blocks.py", line 994, in postprocess_data
    block = self.blocks[output_id].
KeyError: 1086
Traceback (most recent call last):
  File "L:\automatic\venvibsite-packages-packages-gradio\routes.py", line 394, in run_predict
    output = await app.get_blocks().process_api(
  File "L:\automatic\venvibsite-packages-packages-gradioblocks.py", line 1078, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "L:\automatic\venvibsite-packages-packages-gradio\blocks.py", line 994, in postprocess_data
    block = self.blocks[output_id].
KeyError: 1086

It's not that it doesn't load the models, it's the UI that gets "stuck". Now I have also set the VAE selector and this is the one that keeps loading indefinitely:

image

electrashave commented 1 year ago

Seems the info I got on another issue is related to this rather than Issue #535. Copying the information here for visibility Side note: You CAN use the webui on a mobile device by disabling queue, but this is a workaround not a fix and not ideal

Steps I used to reproduce:

~I checked the dev tools console and it looks like it's getting stuck on sd_model_callback in ui.js for some reason as it keeps displaying "Loading XXXXX" over and over~ Comparing the WebSocket messages sent when the issue is not present to when I can reproduce the issue, the model list is not being sent when you first load the webui which I suspect is the root cause

Screenshots showing what I mean: https://imgur.com/a/KytHThv

That's as far as I've gotten, hope it helps!

vladmandic commented 1 year ago

sd_model_callback() is not the root cause, thats the code i've added yesterday to try to trace this exact issue, but its not doing much at the moment as its not complete yet. yes, if its still inside that function that only says that "progress bar" has not dissapeared. once "progress bar" dissapears, callback exits - that will allow me to monitor when progress bar stay on screen when actual loading completed. but like i said, its not finished yet.

electrashave commented 1 year ago

I believe the root cause is the WebSocket issue explained above, the sd_model_callback() part was just an observation, my bad Thanks for your hard work on this btw! I can consistently reproduce the issue with the steps above if you need any testing done

vladmandic commented 1 year ago

There are several websocket related items fixed in latest gradio, I need to test it and it goes well I'll upgrade from 3.23 to 3.28 next week.

Btw, I wish that there was option to select in Gradio to use HTTP vs WS, but it auto-decides - if queue is enabled, it uses WS and if not, it uses HTTP. And in reality, I'd like to have queues with HTTP.

cibernicola commented 1 year ago

It is not, or not only, in my case no, it has to do with the extensions, I have removed several and, so far, it has not happened again.

tranzmatt commented 1 year ago

There are several websocket related items fixed in latest gradio, I need to test it and it goes well I'll upgrade from 3.23 to 3.28 next week.

Btw, I wish that there was option to select in Gradio to use HTTP vs WS, but it auto-decides - if queue is enabled, it uses WS and if not, it uses HTTP. And in reality, I'd like to have queues with HTTP.

Let's hope because it is maddening to have to kill/restart both the app and force a reload of the browser on a regular basis. The model selection drop-down is the chief culprit, but it's happened elsewhere. Gradio 3.16 wasn't ideal, but it was better than 3.23 in terms of stability by light-years.

vladmandic commented 1 year ago

I agree, but can't go back anymore. I hope new one is better...

vladmandic commented 1 year ago

gradio was updated yesterday, can you check if this issue is resolved with the update?

mart-hill commented 1 year ago

I "checkouted" back to eaea88a4 commit for now, I'll test! :) With --disable-queue, there's no problem with the model list for sure.

electrashave commented 1 year ago

gradio was updated yesterday, can you check if this issue is resolved with the update?

The list doesn't appear to load forever, but there are still issues when connecting via a mobile device On a fresh install of the latest version I am unable to generate any images on mobile It just says "Waiting..." then that disappears and clicking "Generate" no longer does anything Among other things

Here are the HAR logs from the dev tools network tab (Contains ONLY the requests made when pressing "Generate" so you don't have to sift through them) https://anonfiles.com/l0Z0R9odz7/SDWebUIExpectedNetworkLog_har https://anonfiles.com/x3ZcRfo9zc/SDWebUIMobileNetworkLog_har Along with screenshots too if that makes it easier https://imgur.com/a/veMHlvk

If there's anything else I can grab for you let me know! Should we communicate with the gradio devs on this?

vladmandic commented 1 year ago

original problem for which this issue was created was general one, not specific to mobile devices. i need to try to reproduce using mobile device - which device and browser are you using?

electrashave commented 1 year ago

I'm only able to reproduce issues with a mobile device, no telling if this is the exact same issue that the user who opened this issue was having or if fixing this will fix users experiencing similar issues on desktop

I'm using an android phone, OnePlus Nord N10 to be specific I was able to reproduce the issue on Chrome and Brave, I haven't tested others I test using a fresh copy of the webui each time, using webui --listen

I'm going to test using a different router for my local network when I get the chance so we can rule that out as a potential cause. Will report my findings when I do

yamasoo commented 1 year ago

Same on PC for Chrome browser (112.0.5615.138 ) but not happened in edge...............

vladmandic commented 1 year ago

updated gradio to 3.29 with several fixes. not sure if its going to resolve this, but worth a shot - can anyone check?

mart-hill commented 1 year ago

Should I check with --disable-queue arg. or normal way?

vladmandic commented 1 year ago

both/either?

mart-hill commented 1 year ago

With the --disable-queue it doesn't happen anymore since that fix (that enabled us to use this arg. in the first place). I'll check with the WebSockets now. :)

mart-hill commented 1 year ago

For now, with the WebSockets on, the bug doesn't happen. I was just switching the models though.

I got the refresh bug when I tried to change the model while the UI was busy loading the ViT/L14 model to evaluate the aesthetics of the image. I'll disable this at the next approach. I'll stress - this is while I'm using WebSockets (which are really nasty for me), and many of the "send to" functionalities between tabs/extensions might (or don't) work for me because of that.

vladmandic commented 1 year ago

ok, so websockets aside (and we have separate issue for that), this looks like its resolved?

mart-hill commented 1 year ago

Yes!

vladmandic commented 1 year ago

ok, closing this issue and lets keep working on the other one!