lllyasviel / Omost

Your image is almost there!
Apache License 2.0
7.25k stars 418 forks source link

QOL Improvements #27

Open d8ahazard opened 3 months ago

d8ahazard commented 3 months ago

Add:

  1. Args for custom checkpoint, llm, outputs locations.
  2. Args to pass HF token on startup.
  3. Args for default llm/checkpoint on startup.
  4. Selector for custom/default checkpoints, llms.
  5. Seed randomization via -1
  6. Save outputs to user-specified dir.
  7. Load any checkpoint via from_single_file
xhoxye commented 3 months ago

`Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch(). Traceback (most recent call last): File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\queueing.py", line 528, in process_events response = await route_utils.call_process_api( File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api output = await app.get_blocks().process_api( File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\blocks.py", line 1908, in process_api result = await self.call_function( File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\blocks.py", line 1497, in call_function prediction = await utils.async_iteration(iterator) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\utils.py", line 632, in async_iteration return await iterator.anext() File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper response = await iterator.anext() File "E:\AI\Omost\Omost\chat_interface.py", line 554, in _stream_fn first_response, first_interrupter = await async_iteration(generator) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\utils.py", line 632, in async_iteration return await iterator.anext() File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\utils.py", line 625, in anext return await anyio.to_thread.run_sync( File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\anyio_backends_asyncio.py", line 859, in run result = context.run(func, *args) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\utils.py", line 608, in run_sync_iterator_async return next(iterator) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context response = gen.send(None) File "E:\AI\Omost\Omost\gradio_app.py", line 208, in chat_fn seed = np.random.randint(0, 2 ** 32 - 1) File "numpy\random\mtrand.pyx", line 780, in numpy.random.mtrand.RandomState.randint File "numpy\random\_bounded_integers.pyx", line 2881, in numpy.random._bounded_integers._rand_int32 ValueError: high is out of bounds for int32 Last assistant response is not valid canvas: expected string or bytes-like object` QQ截图20240603070353 QQ截图20240603070529

ginto-sakata commented 3 months ago

@xhoxye seems like you have 32-bit version of numpy

xhoxye commented 3 months ago

The original version works fine @ginto-sakata

ginto-sakata commented 3 months ago

@xhoxye that's because @d8ahazard added a random seed generation, for which upper limit is 2**32 -1. Your error indicates that you might have 32-bit version of numpy. Check if your numpy is 64-bit:

>>> import numpy as np
>>> print(np.int_)
<class 'numpy.int64'>
xhoxye commented 3 months ago

@ginto-sakata How do I change it?

d8ahazard commented 3 months ago

Try another pull, I think I fixed the precision issue.

xhoxye commented 3 months ago

I'm still having the same error here, unless I force seed = np.random.randint(0, 2 ** 31 - 1), and reading hf_download when rendering an image is also an error, and I can't read the model I copied to the Omost\models\checkpoints folder when refreshing

ginto-sakata commented 3 months ago

@ginto-sakata How do I change it?

Just make sure you have 64-bit python and install numpy again (you might want to do a clean install just in case)

edit: Never mind, the problem is not in your system, this is intended behavior for numpy on windows: https://numpy.org/devdocs/reference/random/generated/numpy.random.randint.html

This function defaults to the C-long dtype, which is 32bit on windows and otherwise 64bit on 64bit platforms

I have tried on my windows system and yes, it shows <class 'numpy.int32'>

lludlow commented 3 months ago
  • Seed randomization via -1

Do we always want a random seed? a random seed button would give you more control

d8ahazard commented 3 months ago
  • Seed randomization via -1

Do we always want a random seed? a random seed button would give you more control

Could be a good idea. I was just emulating how AUTO does it in that -1 == "randomized", otherwise, put in whatever you want.

xhoxye commented 3 months ago

the .safetensors model use fp16, otherwise 8GB vram will not work

jtabox commented 3 months ago

Hi, I've been using this improved branch for a while, thanks for your work. I don't have any bug to report, just a question I can't figure out: When using local SDXL models, when loading, some of them take forever at the Load to GPU: UNet2DConditionModel stage, while others load quickly, and I can't figure out why.

For example:

Using the default RunDiffusion/Juggernaut-X-v10 model:

Load to GPU: UNet2DConditionModel loads at 1.36 iterations/sec, and takes less than 20 secs.

While using a local juggernautXL_juggernautX.safetensors:

Load to GPU: UNet2DConditionModel loads at 100.39 seconds/iteration and I have to cancel it because it takes more than an hour.

This isn't happening with all my local models, some do it others don't. Is there anything I can do to change this? I'm running on a 3080 with 10 GB.

Dibucci commented 3 months ago

I'm new to this but I'm loving how this works. it's honestly amazing imo. The only issue I seam to have is, I have no clue how to code or where/what to look for in the files to even change what model it is getting. Currently running this in a python env, but as someone who doesn't know code, well besides how to just setup a "simple" env, just wondering if there are any QOL improvements that would maybe include a dropdown for model selection to a folder that we can just put compatible models into?

I already use SD with comfyUI and other things, and would love to see how this works on some of my own SDXL merges I have worked on, but I'm at a loss for how to even do it.

jtabox commented 3 months ago

I'm new to this ... snip ... I'm at a loss for how to even do it.

You're probably not using this specific fork you commented on? This comment chain is for another fork (https://github.com/runnitai/Omost/tree/main), not lllyasviel's main fork. Because this specific fork here already has some QoL improvements implemented, and among them is what you're asking for, i.e. the ability to use a dropdown menu and choose models you've put into a folder.

The easiest way to use this here fork would be to create a new folder and follow the same instructions as you did for the main fork, but change the git clone command to use this other repo.

  1. git clone https://github.com/runnitai/Omost.git
  2. conda create -n omost python=3.10 && conda activate omost && cd Omost
  3. pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
  4. pip install -r requirements
  5. python gradio_app.py

Inside the Omost folder you should see the subfolders models and then checkpoints. Put any models you'd like to use in the latter. Inside the UI there's a Refresh Models button. Press it and your models should be available from the dropdown menu.

Edit: as you might have read in my previous comment, some local models seem to take an extremely long time to load, making the process practically impossible. I haven't figured out what causes this yet. I have a 3080 so loading models and generating images isn't an issue generally, so it's probably not VRAM related. Not to mention that I can run those same models normally in A1111 & ComfyUI.

Dibucci commented 3 months ago

ty Jtabox, I've never been good with git stuff, this helps a lot, I'll try this here shortly

now to see if it works with a symlink so i don't have to physically move stuff

Dibucci commented 3 months ago

Edit: as you might have read in my previous comment, some local models seem to take an extremely long time to load, making the process practically impossible. I haven't figured out what causes this yet. I have a 3080 so loading models and generating images isn't an issue generally, so it's probably not VRAM related. Not to mention that I can run those same models normally in A1111 & ComfyUI.

I see what you mean. Just tried 3 different models, cuz I"m trying not to use pony its good but i hate the "score" system, and the one called EchoAlpha didn't load, the new Compassmix, which is a SDXL lightning model, just hung, and then i tried Furry_XL, aka Seart's furry model that was used to make Compassmix, and it worked, but it was moving so slow, at like 90-110 seconds per iteration. However, when on SD WebUI's I usually get about 2-4 iterations per second on a 1024x1024 res and at lest 1.2 if I go to as high as 1440x1440, which is rare but sometimes i get lucky at that res with a good image. I'm gonna try it with pony but i've also noticed that its got different LLM's to select from as well. So, is the one selected by default when I load up this fork of the program the same one used in the main branch? Only asking cuz with the Main Branch it generates at normal speed.... but i guess i better try RealVision first cuz I haven't tried it at pure default yet, just went straight to putting in my own model.

I have an RTX 4070TI OC with 12GB of VRAM, the OC is Overclock and it was OC from factory according to the package, and no, i don't know how to OC, or what it is OC to, so whatever it's set to its staying at. I won't touch something I don't understand myself.

EDIT: so, default with RealVisXL V4.0 works fine, about 2 it/s, which is fine and dandy. but so far any other model I seam to try, that is mentiond above, either hangs or takes forever and my whole system bogs down. Like my pc was lagging at like 10 fps for everything

Dibucci commented 3 months ago

so, I followed your instructions, but idk if "I'm" doing something wrong or what at this point...

I sat here for about 15 Minutes and still couldn't get one iteration with BASE PonyDiffusionXL model... the ones that download from the program work fine, so... idk....

when it hangs like that idk what else to do so i just CTRL+C on the command window and it closes the running instance. I'm going to try downloading the models "fresh" from CivitAI and see if maybe its something to do with the models I have moved from my SD WebUI folder to this Programs folder.

if it doesn't work, then maybe i did something wrong or maybe theres some weird incompatability with my card... idk, and, if it does work with the same model "re-downloaded" straight from CivitAI into the Checkpoints folder of this program, them I'm just gonna be even more confused and wondering, If maybe I should delete and re-download all my models cuz somehow maybe "THEY" are the thing thats broken xD

this is what's going on in the console when It tries to gen the image


Load to GPU: LlamaForCausalLM
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
G:\Omost\Omost 2\Omost\lib\site-packages\transformers\models\llama\modeling_llama.py:649: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
You shouldn't move a model that is dispatched using accelerate hooks.
Unload to CPU: LlamaForCausalLM
Load to GPU: CLIPTextModel
Load to GPU: CLIPTextModel
Unload to CPU: CLIPTextModel
Unload to CPU: CLIPTextModel
Load to GPU: UNet2DConditionModel
  0%|                                                                                                             | 0/25 [00:00<?, ?it/s]K ```
jtabox commented 3 months ago

so, I followed your instructions, but idk if "I'm" doing something wrong or what at this point... ... snip ... Load to GPU: UNet2DConditionModel

Aye, that's the exact point I'm stuck at, regarding some models. Some local models load fine, but others take so much time at this step (Load to GPU: UNet2DConditionModel) that I have to Ctrl+C. For the models that work, this process takes mere seconds, but for the models that don't, it's extremely slow and unusable.

Unfortunately my knowledge regarding coding these things kinda stops here, so I can't even figure out if it's some setting I'm missing or if it's because I lack GPU memory. Though I don't think it's the latter, those same models load fine in Forge, Fooocus & Comfy UI.

But yeah, that was the reason I posted initially in this thread, any suggestions would be very welcome.

d8ahazard commented 1 month ago

@lllyasviel - I see you're on a tear doing updates to forge and focus. Maybe take a look at this PR, consider merging it?