I need help, I tried to get help in DISCORD but nobody answered, I already use ai toolkit since august, it works flawlesly. I tried to install the newer version in windows (not an update, a newer installation from the beggining) and I followed all the steps but it simply doesn't work and shows multiple error messages. Can anyone tell me what is the pre requisites for the latest version of ai toolkit? I have Python 3.10.11, nvidia toolkit cuda_11.8.r11.8 , visual studio 2022, cl.exe in environment variable, but it doesn't work. Thanks for any help.

I would like to add that I have now done a new instal, twice, and I believe I have the exact same issue as pianogospel. Here is what the command window shows

(venv) C:\AI\ai-toolkit>python run.py config/JL_lora_flux_24gb.yaml Running 1 job C:\AI\ai-toolkit\venv\lib\site-packages\albumentations__init__.py:13: UserWarning: A new version of Albumentations is available: 1.4.18 (you have 1.4.15). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1. check_for_updates() C:\AI\ai-toolkit\venv\lib\site-packages\controlnet_aux\mediapipe_face\mediapipe_face_common.py:7: UserWarning: The module 'mediapipe' is not installed. The package will have limited functionality. Please install it using the command: pip install 'mediapipe' warnings.warn( C:\AI\ai-toolkit\venv\lib\site-packages\controlnet_aux\segment_anything\modeling\tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_5m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_5m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper) C:\AI\ai-toolkit\venv\lib\site-packages\controlnet_aux\segment_anything\modeling\tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_11m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_11m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper) C:\AI\ai-toolkit\venv\lib\site-packages\controlnet_aux\segment_anything\modeling\tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper) C:\AI\ai-toolkit\venv\lib\site-packages\controlnet_aux\segment_anything\modeling\tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_384 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_384. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper) C:\AI\ai-toolkit\venv\lib\site-packages\controlnet_aux\segment_anything\modeling\tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_512 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_512. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper) { "type": "sd_trainer", "training_folder": "output", "device": "cuda:0", "network": { "type": "lora", "linear": 16, "linear_alpha": 16 }, "save": { "dtype": "float16", "save_every": 250, "max_step_saves_to_keep": 4, "push_to_hub": false }, "datasets": [ { "folder_path": "C:\AI\Lora\jennytestFlux", "caption_ext": "txt", "caption_dropout_rate": 0.05, "shuffle_tokens": false, "cache_latents_to_disk": true, "resolution": [ 512, 768, 1024 ] } ], "train": { "batch_size": 1, "steps": 2000, "gradient_accumulation_steps": 1, "train_unet": true, "train_text_encoder": false, "gradient_checkpointing": true, "noise_scheduler": "flowmatch", "optimizer": "adamw8bit", "lr": 0.0001, "ema_config": { "use_ema": true, "ema_decay": 0.99 }, "dtype": "bf16" }, "model": { "name_or_path": "black-forest-labs/FLUX.1-dev", "is_flux": true, "quantize": true, "low_vram": true }, "sample": { "sampler": "flowmatch", "sample_every": 250, "width": 1024, "height": 1024, "prompts": [ "JL, a woman holding a coffee cup, in a beanie, sitting at a cafe", "a woman holding a coffee cup, in a beanie, sitting at a cafe" ], "neg": "", "seed": 42, "walk_seed": true, "guidance_scale": 4, "sample_steps": 20 } } Using EMA C:\AI\ai-toolkit\extensions_built_in\sd_trainer\SDTrainer.py:61: FutureWarning: torch.cuda.amp.GradScaler(args...) is deprecated. Please use torch.amp.GradScaler('cuda', args...) instead. self.scaler = torch.cuda.amp.GradScaler()

#############################################

Running job: JL_flux_lora_v1

#############################################

Running 1 process Loading Flux model Loading transformer Quantizing transformer C:\AI\ai-toolkit\venv\lib\site-packages\torch\utils\cpp_extension.py:380: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified warnings.warn(f'Error checking compiler version for {compiler}: {error}') C:\AI\ai-toolkit\venv\lib\site-packages\torch\utils\cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. warnings.warn( INFO: Could not find files for the given pattern(s). Error running job: Command '['where', 'cl']' returned non-zero exit status 1.

======================================== Result:

0 completed jobs
1 failure

Traceback (most recent call last): File "C:\AI\ai-toolkit\run.py", line 90, in main() File "C:\AI\ai-toolkit\run.py", line 86, in main raise e File "C:\AI\ai-toolkit\run.py", line 78, in main job.run() File "C:\AI\ai-toolkit\jobs\ExtensionJob.py", line 22, in run process.run() File "C:\AI\ai-toolkit\jobs\process\BaseSDTrainProcess.py", line 1241, in run self.sd.load_model() File "C:\AI\ai-toolkit\toolkit\stable_diffusion_model.py", line 613, in load_model transformer.to(self.device_torch) File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\nn\modules\module.py", line 1174, in to return self._apply(convert) File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\nn\modules\module.py", line 805, in _apply param_applied = fn(param) File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\nn\modules\module.py", line 1160, in convert return t.to( File "C:\AI\ai-toolkit\venv\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 273, in torch_function return func(*args, **kwargs) File "C:\AI\ai-toolkit\venv\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 299, in torch_dispatch return WeightQBytesTensor.create( File "C:\AI\ai-toolkit\venv\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 140, in create return MarlinF8QBytesTensor(qtype, axis, size, stride, data, scale, requires_grad) File "C:\AI\ai-toolkit\venv\lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\qbits.py", line 80, in init data_packed = MarlinF8PackedTensor.pack(data) # pack fp8 data to in32, and apply marlier re-ordering. File "C:\AI\ai-toolkit\venv\lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\packed.py", line 183, in pack data_int32 = torch.ops.quanto.pack_fp8marlin( File "C:\AI\ai-toolkit\venv\lib\site-packages\torch_ops.py", line 1061, in call return self._op(*args, *(kwargs or {})) File "C:\AI\ai-toolkit\venv\lib\site-packages\optimum\quanto\library\extensions\cuda__init__.py", line 164, in gptq_marlin_repack return ext.lib.gptq_marlin_repack(b_q_weight, perm, size_k, size_n, num_bits) File "C:\AI\ai-toolkit\venv\lib\site-packages\optimum\quanto\library\extensions\extension.py", line 42, in lib self._lib = load( File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1312, in load return _jit_compile( File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1722, in _jit_compile _write_ninja_file_and_build_library( File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1821, in _write_ninja_file_and_build_library _write_ninja_file_to_build_library( File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2246, in _write_ninja_file_to_build_library _write_ninja_file( File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2382, in _write_ninja_file cl_paths = subprocess.check_output(['where', File "C:\Users\MrUSB\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 421, in check_output return run(popenargs, stdout=PIPE, timeout=timeout, check=True, File "C:\Users\MrUSB\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

@Anothergazz Do you sport a 3090 by any chance? Just a guess from spotting quanto and fp8 in your error log.

Ampere does not support float8 and qfloat8 is hardcoded in toolkit/stable_diffusion_model.py. For example: https://github.com/ostris/ai-toolkit/blob/ce759ebd8c653a5ac61c15c1bdacb210aa37df9e/toolkit/stable_diffusion_model.py#L611C39-L611C64 Changing all quantization to qint8 worked for me. Maybe this could be made configurable at some point.

pianogospel

Thanks for the suggestion but I splashed out on a 4090 (in a moment of weakness)on widows 11

On windows: I too can confirm that all builds published the last few days crush after "Quantizing transformer". I have a 4090 and 32gb ram, I have tested various versions of NVIDIA driver, CUDA, python, visual studio, torch and environment settings, all of which seem to work perfectly fine.

Request: I would be grateful if anyone can mention the latest build that worked for them on windows.

I like the fact that this is very experimental, you guys are doing great, take your time, and thanks!

my error:

Running 1 process Loading Flux model Loading transformer Quantizing transformer Error running job: Error building extension 'quanto_cuda': [1/2] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o FAILED: gemm_cuda.cuda.o C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(95): error: identifier "asm" is undefined asm volatile( ^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(98): error: expected a ")" : "=r"(((unsigned )(shared_warp + (ax0_0 8)))[0]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[1]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[2]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[3]) ^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(104): error: identifier "asm" is undefined asm volatile( ^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(107): error: expected a ")" : "=r"(((unsigned )(shared_warp + (ax0_0 8)))[0]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[1]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[2]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[3]) ^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(126): error: identifier "asm" is undefined asm volatile( ^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(129): error: expected a ")" : "=f"(((float )C_warp)[0]), "=f"(((float )C_warp)[1]), "=f"(((float )C_warp)[2]), "=f"(((float )C_warp)[3]) ^

6 errors detected in the compilation of "C:/ai-toolkit/venv/Lib/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu". gemm_cuda.cu ninja: build stopped: subcommand failed.

======================================== Result:

0 completed jobs

1 failure

Traceback (most recent call last): File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 2105, in _run_ninja_build subprocess.run( File "C:\Python311\Lib\subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\ai-toolkit\run.py", line 90, in main() File "C:\ai-toolkit\run.py", line 86, in main raise e File "C:\ai-toolkit\run.py", line 78, in main job.run() File "C:\ai-toolkit\jobs\ExtensionJob.py", line 22, in run process.run() File "C:\ai-toolkit\jobs\process\BaseSDTrainProcess.py", line 1241, in run self.sd.load_model() File "C:\ai-toolkit\toolkit\stable_diffusion_model.py", line 613, in load_model transformer.to(self.device_torch) File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 1174, in to return self._apply(convert) ^^^^^^^^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 805, in _apply param_applied = fn(param) ^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 1160, in convert return t.to( ^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 273, in torch_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 299, in torch_dispatch return WeightQBytesTensor.create( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 140, in create return MarlinF8QBytesTensor(qtype, axis, size, stride, data, scale, requires_grad) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\qbits.py", line 80, in init data_packed = MarlinF8PackedTensor.pack(data) # pack fp8 data to in32, and apply marlier re-ordering. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\packed.py", line 183, in pack data_int32 = torch.ops.quanto.pack_fp8marlin( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\torch_ops.py", line 1061, in call return self._op(*args, **(kwargs or {})) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda__init.py", line 164, in gptq_marlin_repack return ext.lib.gptq_marlin_repack(b_q_weight, perm, size_k, size_n, num_bits) ^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\extension.py", line 42, in lib self._lib = load( ^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 1312, in load return _jit_compile( ^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 1722, in _jit_compile _write_ninja_file_and_build_library( File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 1834, in _write_ninja_file_and_build_library _run_ninja_build( File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 2121, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'quanto_cuda': [1/2] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o FAILED: gemm_cuda.cuda.o C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(95): error: identifier "asm" is undefined asm volatile( ^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(98): error: expected a ")" : "=r"(((unsigned )(shared_warp + (ax0_0 8)))[0]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[1]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[2]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[3]) ^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(104): error: identifier "asm" is undefined asm volatile( ^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(107): error: expected a ")" : "=r"(((unsigned )(shared_warp + (ax0_0 8)))[0]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[1]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[2]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[3]) ^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(126): error: identifier "asm" is undefined asm volatile( ^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(129): error: expected a ")" : "=f"(((float )C_warp)[0]), "=f"(((float )C_warp)[1]), "=f"(((float )C_warp)[2]), "=f"(((float )C_warp)[3]) ^

6 errors detected in the compilation of "C:/ai-toolkit/venv/Lib/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu". gemm_cuda.cu ninja: build stopped: subcommand failed.

(venv) C:\ai-toolkit>python Python 3.11.3 (tags/v3.11.3:f3909b8, Apr 4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.

import torch import numpy print(torch.version) 2.4.1+cu121 print(numpy.version) 1.26.3

On windows: I too can confirm that all builds published the last few days crush after "Quantizing transformer". I have a 4090 and 32gb ram, I have tested various versions of NVIDIA driver, CUDA, python, visual studio, torch and environment settings, all of which seem to work perfectly fine.

Request: I would be grateful if anyone can mention the latest build that worked for them on windows.

I like the fact that this is very experimental, you guys are doing great, take your time, and thanks!

my error:

Running 1 process Loading Flux model Loading transformer Quantizing transformer Error running job: Error building extension 'quanto_cuda': [1/2] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o FAILED: gemm_cuda.cuda.o C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(95): error: identifier "asm" is undefined asm volatile( ^ C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(98): error: expected a ")" : "=r"(((unsigned )(shared_warp + (ax0_0 8)))[0]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[1]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[2]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[3]) ^ C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(104): error: identifier "asm" is undefined asm volatile( ^ C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(107): error: expected a ")" : "=r"(((unsigned )(shared_warp + (ax0_0 8)))[0]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[1]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[2]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[3]) ^ C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(126): error: identifier "asm" is undefined asm volatile( ^ C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(129): error: expected a ")" : "=f"(((float )C_warp)[0]), "=f"(((float )C_warp)[1]), "=f"(((float )C_warp)[2]), "=f"(((float )C_warp)[3]) ^ 6 errors detected in the compilation of "C:/ai-toolkit/venv/Lib/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu". gemm_cuda.cu ninja: build stopped: subcommand failed.

Result:

0 completed jobs

1 failure

Traceback (most recent call last): File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 2105, in _run_ninja_build subprocess.run( File "C:\Python311\Lib\subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\ai-toolkit\run.py", line 90, in main() File "C:\ai-toolkit\run.py", line 86, in main raise e File "C:\ai-toolkit\run.py", line 78, in main job.run() File "C:\ai-toolkit\jobs\ExtensionJob.py", line 22, in run process.run() File "C:\ai-toolkit\jobs\process\BaseSDTrainProcess.py", line 1241, in run self.sd.load_model() File "C:\ai-toolkit\toolkit\stable_diffusion_model.py", line 613, in load_model transformer.to(self.device_torch) File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 1174, in to return self._apply(convert) ^^^^^^^^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 805, in _apply param_applied = fn(param) ^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 1160, in convert return t.to( ^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 273, in torch_function return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 299, in torch_dispatch return WeightQBytesTensor.create( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 140, in create return MarlinF8QBytesTensor(qtype, axis, size, stride, data, scale, requires_grad) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\qbits.py", line 80, in init data_packed = MarlinF8PackedTensor.pack(data) # pack fp8 data to in32, and apply marlier re-ordering. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\packed.py", line 183, in pack data_int32 = torch.ops.quanto.pack_fp8_marlin( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\torch_ops.py", line 1061, in call* return self_._op(args, (kwargs or {})) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cudainit.py", line 164, in gptq_marlin_repack return ext.lib.gptq_marlin_repack(b_q_weight, perm, size_k, size_n, num_bits) ^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\extension.py", line 42, in lib self._lib = load( ^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 1312, in load return _jit_compile( ^^^^^^^^^^^^^ File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 1722, in _jit_compile _write_ninja_file_and_build_library( File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 1834, in _write_ninja_file_and_build_library _run_ninja_build( File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 2121, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'quanto_cuda': [1/2] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o FAILED: gemm_cuda.cuda.o C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(95): error: identifier "asm" is undefined asm volatile( ^ C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(98): error: expected a ")" : "=r"(((unsigned )(shared_warp + (ax0_0 8)))[0]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[1]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[2]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[3]) ^ C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(104): error: identifier "asm" is undefined asm volatile( ^ C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(107): error: expected a ")" : "=r"(((unsigned )(shared_warp + (ax0_0 8)))[0]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[1]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[2]), "=r"(((unsigned )(shared_warp + (ax0_0 8)))[3]) ^ C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(126): error: identifier "asm" is undefined asm volatile*( ^ C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(129): error: expected a ")" : "=f"(((float )C_warp)[0]), "=f"(((float )C_warp)[1]), "=f"(((float )C_warp)[2]), "=f"(((float *)C_warp)[3]) ^ 6 errors detected in the compilation of "C:/ai-toolkit/venv/Lib/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu". gemm_cuda.cu ninja: build stopped: subcommand failed. (venv) C:\ai-toolkit>python Python 3.11.3 (tags/v3.11.3:f3909b8, Apr 4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.

import torch import numpy print(torch.version) 2.4.1+cu121 print(numpy.version) 1.26.3

I think I was running into the same issue, also with a 4090 and 32gb of RAM. I was looking for something to correct since I've been messing with CUDA tookit and torch versions for a few hours. Ran into this comment: https://github.com/ostris/ai-toolkit/issues/169#issuecomment-2406105305 --- I downgraded to optimum-quanto 0.2.4 and the errors are gone and the batch is running now. 10% in but hoping it will be fine.

Same issue here on a 3090. Previous version of ai-toolkit still working fine.

I think I was running into the same issue, also with a 4090 and 32gb of RAM. I was looking for something to correct since I've been messing with CUDA tookit and torch versions for a few hours. Ran into this comment: #169 (comment) --- I downgraded to optimum-quanto 0.2.4 and the errors are gone and the batch is running now. 10% in but hoping it will be fine.

SUCCESS! Your suggestion worked, I trained 70 images for over 4 hours and I am very happy with the outcome. I'll list some details of my build for those who still seek answers.

The latest build as of October 14th worked for me (RTX4090 + 32gb RAM) using: CUDA 12.6.2 NVIDIA 565.90 cuDNN 9.5.0

You may need to adjust some directories here if the above installations have not done it automatically, ask an AI to explain: System Properties > Advanced > Environment Variables

You also need Visual Studio 2022 (not just the build tools) Ensure that "MSVC v143 - VS 2022 C++ x64/x86 build tools" (or v142 for VS2019) is selected during installation. Under "Individual components", ensure you have "C++ CMake tools for Windows" and "Windows 10 SDK (10.0.19041)" checked.

This is the list of all installed packages and their versions within the environment, no more than a couple of them may have been unnecessarily been added/altered during troubleshooting, if you dont know what to do with this list give to an AI and ask to help you find and compare with yours:

absl-py==2.1.0 accelerate==1.0.0 aiofiles==23.2.1 albucore==0.0.16 albumentations==1.4.15 annotated-types==0.7.0 antlr4-python3-runtime==4.9.3 anyio==4.6.0 attrs==24.2.0 bitsandbytes==0.44.1 certifi==2024.8.30 charset-normalizer==3.4.0 clean-fid==0.1.35 click==8.1.7 clip-anytorch==2.6.0 colorama==0.4.6 controlnet-aux==0.0.7 dctorch==0.1.2 diffusers @ git+https://github.com/huggingface/diffusers.git@38a3e4df926c59bc122191c0fc8066755e98b6d2 docker-pycreds==0.4.0 einops==0.8.0 eval_type_backport==0.2.0 fastapi==0.115.0 ffmpy==0.4.0 filelock==3.13.1 flatten-json==0.1.14 fsspec==2024.2.0 ftfy==6.2.3 gitdb==4.0.11 GitPython==3.1.43 gradio==5.0.1 gradio_client==1.4.0 grpcio==1.66.2 h11==0.14.0 hf_transfer==0.1.8 httpcore==1.0.6 httpx==0.27.2 huggingface-hub==0.25.2 idna==3.10 imageio==2.35.1 importlib_metadata==8.5.0 invisible-watermark==0.2.0 Jinja2==3.1.3 jsonmerge==1.9.2 jsonschema==4.23.0 jsonschema-specifications==2024.10.1 k-diffusion==0.1.1.post1 kornia==0.7.3 kornia_rs==0.1.5 lazy_loader==0.4 lpips==0.1.4 lycoris-lora==1.8.3 Markdown==3.7 markdown-it-py==3.0.0 MarkupSafe==2.1.5 mdurl==0.1.2 mpmath==1.3.0 networkx==3.2.1 ninja==1.11.1.1 numpy==1.26.3 omegaconf==2.3.0 open_clip_torch==2.26.1 opencv-python==4.10.0.84 opencv-python-headless==4.10.0.84 optimum-quanto==0.2.4 orjson==3.10.7 oyaml==1.0 packaging==24.1 pandas==2.2.3 peft==0.13.1 pillow==10.2.0 platformdirs==4.3.6 prodigyopt==1.0 protobuf==5.28.2 psutil==6.0.0 pydantic==2.9.2 pydantic_core==2.23.4 pydub==0.25.1 Pygments==2.18.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-multipart==0.0.12 python-slugify==8.0.4 pytorch-fid==0.3.0 pytz==2024.2 PyWavelets==1.7.0 PyYAML==6.0.2 referencing==0.35.1 regex==2024.9.11 requests==2.32.3 rich==13.9.2 rpds-py==0.20.0 ruff==0.6.9 safetensors==0.4.5 scikit-image==0.24.0 scipy==1.14.1 semantic-version==2.10.0 sentencepiece==0.2.0 sentry-sdk==2.16.0 setproctitle==1.3.3 shellingham==1.5.4 six==1.16.0 smmap==5.0.1 sniffio==1.3.1 starlette==0.38.6 sympy==1.12 tensorboard==2.18.0 tensorboard-data-server==0.7.2 text-unidecode==1.3 tifffile==2024.9.20 timm==1.0.9 tokenizers==0.20.1 toml==0.10.2 tomlkit==0.12.0 torch==2.4.1+cu121 torchdiffeq==0.2.4 torchsde==0.2.6 torchvision==0.19.1+cu121 tqdm==4.66.5 trampoline==0.1.2 transformers==4.45.2 typer==0.12.5 typing_extensions==4.9.0 tzdata==2024.2 urllib3==2.2.3 uvicorn==0.31.1 wandb==0.18.3 wcwidth==0.2.13 websockets==12.0 Werkzeug==3.0.4 zipp==3.20.2

IMPORTANT: As the amazing person above has advised us, the optimum-quanto 0.2.5 must be downgraded to optimum-quanto==0.2.4

I dont know if this next command enabled my build to work or not but here it is. After entering your environment to attempt training, and before you run your final command to begin the training, you may enter this command to circumvent a related warning, your gpu may require a different number, ask an AI to help you find it: set TORCH_CUDA_ARCH_LIST=8.9

This was all done in windows CMD in administrator mode to allow caching of all downloads, this is useful for reattempting the installation without downloading the same stuff.

Finally, at some point the build failed because of a "long path" type of error. Simply install this project close to your drives root.

Thanks everyone!

One more question: how to downgrade optimum-quanto from 0.2.5 to 0.2.4?

One more question: how to downgrade optimum-quanto from 0.2.5 to 0.2.4?

either pip install optimum-quanto==0.2.4 or change the line in requirements.txt. Pip will tell you if you have dependencies that mismatch. You'll need to adjust them.

One more question: how to downgrade optimum-quanto from 0.2.5 to 0.2.4?

either pip install optimum-quanto==0.2.4 or change the line in requirements.txt. Pip will tell you if you have dependencies that mismatch. You'll need to adjust them.

Thanks inflamously, works flawlessly

Hi, After the update, it stopped working. I deleted everything, cloned it again, installed everything using the optimum-quantum==0.2.4 library, but now I have a different error, as shown in the screenshot. Does anyone have any suggestions to resolve this?

Screenshot 2024-10-15 154944

Downgrade timm module: pip install timm==1.0.8 and it'll work again. @ironico

Should all dependencies in requirements.txt be frozen? We'll avoid future problems like this.

Hi, @dene- Thank you! It's work now

I use pip install timm<=0.9.5

ostris / ai-toolkit

Errors after new version release #195

Running job: JL_flux_lora_v1

1 failure

1 failure

1 failure