nod-ai / SHARK-Studio

SHARK Studio -- Web UI for SHARK+IREE High Performance Machine Learning Distribution
Apache License 2.0
1.42k stars 171 forks source link

*539.exe always errors out #1042

Closed consolation1 closed 1 year ago

consolation1 commented 1 year ago

Wget working correctly, as per a suggestion in a similar issue. (Win11 for workstations, 32GB ram, RX6800XT)

Command prompt output below:

C:\SD>shark_sd_20230216_539.exe shark_tank local cache is located at C:\Users\consolation.local/shark_tank/ . You may change this by setting the --local_tank_cache= flag vulkan devices are available. cuda devices are not available. Running on local URL: http://0.0.0.0:8080

To create a public link, set share=True in launch(). Found device AMD Radeon RX 6800 XT. Using target triple rdna2-unknown-windows. Using tuned models for Linaqruf/anything-v3.0/fp16/vulkan://00000000-2d00-0000-0000-000000000000. Downloading (…)cheduler_config.json: 100%|█████████████████████| 341/341 [00:00<00:00, 341kB/s] huggingface_hub\file_download.py:129: UserWarning: huggingface_hub cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\consolation.cache\huggingface\diffusers. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the HF_HUB_DISABLE_SYMLINKS_WARNING environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations. To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development torch\jit_check.py:172: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in __init__. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in torch.jit.Attribute. warnings.warn("The TorchScript type system doesn't support " No vmfb found. Compiling and saving to C:\SD\euler_scale_model_input_1_512_512fp16.vmfb Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args Saved vmfb in C:\SD\euler_scale_model_input_1_512_512fp16.vmfb. WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_VERBOSE does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_DEBUG does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. No vmfb found. Compiling and saving to C:\SD\euler_step_1_512_512fp16.vmfb Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args Saved vmfb in C:\SD\euler_step_1_512_512fp16.vmfb. WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_VERBOSE does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_DEBUG does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. Inferring base model configuration. Cannot initialize model with low cpu memory usage because accelerate was not found in the environment. Defaulting to low_cpu_mem_usage=False. It is strongly recommended to install accelerate for faster and less memory-intense model loading. You can do so with:

pip install accelerate

. Downloading (…)_pytorch_model.bin";: 100%|████████████████| 3.44G/3.44G [01:26<00:00, 39.9MB/s] Downloading (…)ain/unet/config.json: 100%|█████████████████████| 901/901 [00:00<00:00, 901kB/s] Retrying with a different base model configuration Cannot initialize model with low cpu memory usage because accelerate was not found in the environment. Defaulting to low_cpu_mem_usage=False. It is strongly recommended to install accelerate for faster and less memory-intense model loading. You can do so with:

pip install accelerate

. torch\fx\node.py:250: UserWarning: Trying to prepend a node to itself. This behavior has no effect on the graph. warnings.warn("Trying to prepend a node to itself. This behavior has no effect on the graph.") Loading Winograd config file from C:\Users\consolation.local/shark_tank/configs/unet_winograd_vulkan.json Retrying with a different base model configuration Cannot initialize model with low cpu memory usage because accelerate was not found in the environment. Defaulting to low_cpu_mem_usage=False. It is strongly recommended to install accelerate for faster and less memory-intense model loading. You can do so with:

pip install accelerate

. Retrying with a different base model configuration Cannot initialize model with low cpu memory usage because accelerate was not found in the environment. Defaulting to low_cpu_mem_usage=False. It is strongly recommended to install accelerate for faster and less memory-intense model loading. You can do so with:

pip install accelerate

. Retrying with a different base model configuration Traceback (most recent call last): File "gradio\routes.py", line 374, in run_predict File "gradio\blocks.py", line 1017, in process_api File "gradio\blocks.py", line 835, in call_function File "anyio\to_thread.py", line 31, in run_sync File "anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread File "anyio_backends_asyncio.py", line 867, in run File "apps\stable_diffusion\scripts\txt2img.py", line 116, in txt2img_inf File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 220, in from_pretrained File "apps\stable_diffusion\src\models\model_wrappers.py", line 383, in call SystemExit: Cannot compile the model. Please create an issue with the detailed log at https://github.com/nod-ai/SHARK/issues Keyboard interruption in main thread... closing server.

Happy to try any suggestions, TIA

yzhang93 commented 1 year ago

Can you clear all the generated files in the folder? Or simply add flag --clear_all

consolation1 commented 1 year ago

It has no effect, other than run the initial downloads again.

yzhang93 commented 1 year ago

Can you try with --no-use_tuned flag?

consolation1 commented 1 year ago

Can you try with --no-use_tuned flag?

Same error, I think - looking at the board, there seems to be a bunch of issue threads that are the same problem?

C:\SD>shark_sd_20230216_539.exe --no-use_tuned shark_tank local cache is located at C:\Users\consolation.local/shark_tank/ . You may change this by setting the --local_tank_cache= flag vulkan devices are available. cuda devices are not available. Running on local URL: http://0.0.0.0:8080

To create a public link, set share=True in launch(). Found device AMD Radeon RX 6800 XT. Using target triple rdna2-unknown-windows. Using tuned models for Linaqruf/anything-v3.0/fp16/vulkan://00000000-2d00-0000-0000-000000000000. torch\jit_check.py:172: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in __init__. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in torch.jit.Attribute. warnings.warn("The TorchScript type system doesn't support " loading existing vmfb from: C:\SD\euler_scale_model_input_1_512_512fp16.vmfb WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_VERBOSE does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_DEBUG does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. loading existing vmfb from: C:\SD\euler_step_1_512_512fp16.vmfb WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_VERBOSE does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_DEBUG does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. Inferring base model configuration. Cannot initialize model with low cpu memory usage because accelerate was not found in the environment. Defaulting to low_cpu_mem_usage=False. It is strongly recommended to install accelerate for faster and less memory-intense model loading. You can do so with:

pip install accelerate

. Retrying with a different base model configuration Cannot initialize model with low cpu memory usage because accelerate was not found in the environment. Defaulting to low_cpu_mem_usage=False. It is strongly recommended to install accelerate for faster and less memory-intense model loading. You can do so with:

pip install accelerate

. torch\fx\node.py:250: UserWarning: Trying to prepend a node to itself. This behavior has no effect on the graph. warnings.warn("Trying to prepend a node to itself. This behavior has no effect on the graph.") Loading Winograd config file from C:\Users\consolation.local/shark_tank/configs/unet_winograd_vulkan.json Retrying with a different base model configuration Cannot initialize model with low cpu memory usage because accelerate was not found in the environment. Defaulting to low_cpu_mem_usage=False. It is strongly recommended to install accelerate for faster and less memory-intense model loading. You can do so with:

pip install accelerate

. Retrying with a different base model configuration Cannot initialize model with low cpu memory usage because accelerate was not found in the environment. Defaulting to low_cpu_mem_usage=False. It is strongly recommended to install accelerate for faster and less memory-intense model loading. You can do so with:

pip install accelerate

. Retrying with a different base model configuration Traceback (most recent call last): File "gradio\routes.py", line 374, in run_predict File "gradio\blocks.py", line 1017, in process_api File "gradio\blocks.py", line 835, in call_function File "anyio\to_thread.py", line 31, in run_sync File "anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread File "anyio_backends_asyncio.py", line 867, in run File "apps\stable_diffusion\scripts\txt2img.py", line 116, in txt2img_inf File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 220, in from_pretrained File "apps\stable_diffusion\src\models\model_wrappers.py", line 383, in call SystemExit: Cannot compile the model. Please create an issue with the detailed log at https://github.com/nod-ai/SHARK/issues

consolation1 commented 1 year ago

Attached output with both flags running clearallnotune.txt

yzhang93 commented 1 year ago

Have you ever had any of the previous releases working? Might be a driver issue @powderluv?

consolation1 commented 1 year ago

Have you ever had any of the previous releases working? Might be a driver issue @powderluv?

I'm using the currently linked driver, which is the new AMD one. I had it working previously on Debian, but the latest build broke it - although there's a chance Debian Testing hosed it. Hence the attempted switch to a windows version, while I try to unravel that mess...

I tried using the 1.4 SD model from HugginFace in custom models directory: got a somewhat different output:

C:\SD>shark_sd_20230216_539.exe --no-use_tuned --clear_all shark_tank local cache is located at C:\Users\consolation.local/shark_tank/ . You may change this by setting the --local_tank_cache= flag CLEARING ALL, EXPECT SEVERAL MINUTES TO RECOMPILE vulkan devices are available. cuda devices are not available. Running on local URL: http://0.0.0.0:8080

To create a public link, set share=True in launch(). Found device AMD Radeon RX 6800 XT. Using target triple rdna2-unknown-windows. Tuned models are currently not supported for this setting. torch\jit_check.py:172: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in __init__. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in torch.jit.Attribute. warnings.warn("The TorchScript type system doesn't support " No vmfb found. Compiling and saving to C:\SD\euler_scale_model_input_1_512_512fp16.vmfb Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args Saved vmfb in C:\SD\euler_scale_model_input_1_512_512fp16.vmfb. WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_VERBOSE does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_DEBUG does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. No vmfb found. Compiling and saving to C:\SD\euler_step_1_512_512fp16.vmfb Using target triple -iree-vulkan-target-triple=rdna2-unknown-windows from command line args Saved vmfb in C:\SD\euler_step_1_512_512fp16.vmfb. WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_VERBOSE does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : Layer name GalaxyOverlayVkLayer_DEBUG does not conform to naming standard (Policy #LLP_LAYER_3) WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. Diffusers' checkpoint will be identified here : C:/SD/models/model Loading diffusers' pipeline from original stable diffusion checkpoint Traceback (most recent call last): File "gradio\routes.py", line 374, in run_predict File "gradio\blocks.py", line 1017, in process_api File "gradio\blocks.py", line 835, in call_function File "anyio\to_thread.py", line 31, in run_sync File "anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread File "anyio_backends_asyncio.py", line 867, in run File "apps\stable_diffusion\scripts\txt2img.py", line 116, in txt2img_inf File "apps\stable_diffusion\src\pipelines\pipeline_shark_stable_diffusion_utils.py", line 220, in from_pretrained File "apps\stable_diffusion\src\models\model_wrappers.py", line 340, in call File "apps\stable_diffusion\src\utils\utils.py", line 427, in preprocessCKPT File "diffusers\pipelines\stable_diffusion\convert_from_ckpt.py", line 976, in load_pipeline_from_original_stable_diffusion_ckpt File "torch\serialization.py", line 810, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch\serialization.py", line 1173, in _load result = unpickler.load() ^^^^^^^^^^^^^^^^ File "torch\serialization.py", line 1166, in find_class return super().find_class(mod_name, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No module named 'pytorch_lightning' Keyboard interruption in main thread... closing serv

consolation1 commented 1 year ago

The AMD driver works with A1111's sd build fwiw

yzhang93 commented 1 year ago

Did you try to select model from the drop-down box? Can you try if "stabilityai/stable-diffusion-2-1-base" works?

consolation1 commented 1 year ago

Did you try to select model from the drop-down box? Can you try if "stabilityai/stable-diffusion-2-1-base" works?

Yeah, I tried all of them, same error.

consolation1 commented 1 year ago

Could it be the venv environment, in the exe, conflicting with the installed Python? Or, am I way off base here?

yzhang93 commented 1 year ago

Hmm, can you check if you have this folder created C:\Users\consolation.local/shark_tank/ and if there's anything saved in the folder? Another way, are you able to checkout out our github repo and try the latest with command lines, there are instructions in README.

yzhang93 commented 1 year ago

Could it be the venv environment, in the exe, conflicting with the installed Python? Or, am I way off base here?

Also are you using python 3.11?

consolation1 commented 1 year ago

Did you meant C:\Users\consolation\.local\shark_tank\ or consolation.local ? Either way, no, neither has been created. I initially tried it with no Python installed, as I don't really use it in windows and it is not listed as a prereq' for the exe - I figured it's contained in the exe. When I hit this error I did try installing latest 3.11, just in case - it didn't help. At that stage I tried A1111's build, but that required a 10.3.6 version, so I removed .11 and switched to .6 I can spin up a clean windows install and try with 3.11 via github install, but that will have to wait till this evening. The main reason I'm trying to get shark running, via exe, is that I need to set it up for a couple of "not super tech savvy" users - and the self contained nature is very appealing.

yzhang93 commented 1 year ago

So maybe it's the local shark_tank folder not appropriately created? Can you try to set it to another directory with --local_tank_cache= flag?

consolation1 commented 1 year ago

You genius - you did it! :-)

consolation1 commented 1 year ago

Here is the console output from the first run. It took a hot minute as expected, but worked correctly. setcache1strun.txt

yzhang93 commented 1 year ago

Awesome! Thanks for your interest and patience!

consolation1 commented 1 year ago

All models and custom ones work correctly btw. Should I mark this as closed? I'm guessing that a bunch of the threads can be fixed with this. Also, glad I could help - if you ever need a crash test dummy for builds, let me know.