Error: fp16 mixed precision requires a GPU; exit status 1

cheesesteak45 commented 1 year ago

Trying to train on an AMD RX6600XT on arch linux. Xformers is disabled, and in the startup script, when it prompts which GPU I want to use, I've tried entering "all", "[all]", "0", and "1", but none of them make the script recognize my GPU and I get ValueError: fp16 mixed precision requires a GPU every time. Any idea of what the issue could be?

The full output is as follows:


2023-03-02 17:54:36.961454: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-02 17:54:37.454232: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-03-02 17:54:37.454271: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-03-02 17:54:37.454277: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-03-02 17:54:39.110471: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-02 17:54:39.580158: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-03-02 17:54:39.580198: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-03-02 17:54:39.580204: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 1.13.1+cu117 with CUDA 1107 (you have 2.0.0.dev20230128+cu118)
    Python  3.10.9 (you have 3.10.9)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
/home/steve/Desktop/stable-diffusion-webui/LoRA_Easy_Training_Scripts/training_image_folder/image_dir/
10_training
Created a txt file named last.txt in the output folder
prepare tokenizer
update token length: 150
Use DreamBooth method.
prepare train images.
found directory 10_training contains 25 image files
250 train images with repeating.
loading image sizes.
100%|█████████████████████████████| 25/25 [00:00<00:00, 7339.89it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）bucket 0: resolution (448, 576), count: 20
bucket 1: resolution (512, 512), count: 90
bucket 2: resolution (576, 448), count: 80
bucket 3: resolution (640, 384), count: 60
mean ar error (without repeats): 0.07514270473459406
prepare accelerator
╭─────────────── Traceback (most recent call last) ────────────────╮
│ /home/steve/Desktop/stable-diffusion-webui/LoRA_Easy_Training_Sc │
│ ripts/main.py:143 in <module>                                    │
│                                                                  │
│   140                                                            │
│   141                                                            │
│   142 if __name__ == "__main__":                                 │
│ ❱ 143 │   main()                                                 │
│   144                                                            │
│                                                                  │
│ /home/steve/Desktop/stable-diffusion-webui/LoRA_Easy_Training_Sc │
│ ripts/main.py:80 in main                                         │
│                                                                  │
│    77 │                                                          │
│    78 │   args = parser.create_args(ArgStore.change_dict_to_inte │
│    79 │   # print(args)                                          │
│ ❱  80 │   train_network.train(args)                              │
│    81                                                            │
│    82                                                            │
│    83 def ensure_file_paths(args: dict) -> None:                 │
│                                                                  │
│ /home/steve/Desktop/stable-diffusion-webui/LoRA_Easy_Training_Sc │
│ ripts/sd_scripts/train_network.py:90 in train                    │
│                                                                  │
│    87                                                            │
│    88   # acceleratorを準備する                                  │
│    89   print("prepare accelerator")                             │
│ ❱  90   accelerator, unwrap_model = train_util.prepare_accelerat │
│    91                                                            │
│    92   # mixed precisionに対応した型を用意しておき適宜castする  │
│    93   weight_dtype, save_dtype = train_util.prepare_dtype(args │
│                                                                  │
│ /home/steve/.local/lib/python3.10/site-packages/library/train_ut │
│ il.py:1817 in prepare_accelerator                                │
│                                                                  │
│   1814 │   log_prefix = "" if args.log_prefix is None else args. │
│   1815 │   logging_dir = args.logging_dir + "/" + log_prefix + t │
│   1816                                                           │
│ ❱ 1817   accelerator = Accelerator(gradient_accumulation_steps=a │
│   1818 │   │   │   │   │   │   │   log_with=log_with, logging_di │
│   1819                                                           │
│   1820   # accelerateの互換性問題を解決する                      │
│                                                                  │
│ /home/steve/.local/lib/python3.10/site-packages/accelerate/accel │
│ erator.py:355 in __init__                                        │
│                                                                  │
│    352 │   │   if self.state.mixed_precision == "fp16" and self. │
│    353 │   │   │   self.native_amp = True                        │
│    354 │   │   │   if not torch.cuda.is_available() and not pars │
│ ❱  355 │   │   │   │   raise ValueError(err.format(mode="fp16",  │
│    356 │   │   │   kwargs = self.scaler_handler.to_kwargs() if s │
│    357 │   │   │   if self.distributed_type == DistributedType.F │
│    358 │   │   │   │   from torch.distributed.fsdp.sharded_grad_ │
╰──────────────────────────────────────────────────────────────────╯
ValueError: fp16 mixed precision requires a GPU
╭─────────────── Traceback (most recent call last) ────────────────╮
│ /home/steve/.local/bin/accelerate:8 in <module>                  │
│                                                                  │
│   5 from accelerate.commands.accelerate_cli import main          │
│   6 if __name__ == '__main__':                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys. │
│ ❱ 8 │   sys.exit(main())                                         │
│   9                                                              │
│                                                                  │
│ /home/steve/.local/lib/python3.10/site-packages/accelerate/comma │
│ nds/accelerate_cli.py:45 in main                                 │
│                                                                  │
│   42 │   │   exit(1)                                             │
│   43 │                                                           │
│   44 │   # Run                                                   │
│ ❱ 45 │   args.func(args)                                         │
│   46                                                             │
│   47                                                             │
│   48 if __name__ == "__main__":                                  │
│                                                                  │
│ /home/steve/.local/lib/python3.10/site-packages/accelerate/comma │
│ nds/launch.py:1104 in launch_command                             │
│                                                                  │
│   1101 │   elif defaults is not None and defaults.compute_enviro │
│   1102 │   │   sagemaker_launcher(defaults, args)                │
│   1103 │   else:                                                 │
│ ❱ 1104 │   │   simple_launcher(args)                             │
│   1105                                                           │
│   1106                                                           │
│   1107 def main():                                               │
│                                                                  │
│ /home/steve/.local/lib/python3.10/site-packages/accelerate/comma │
│ nds/launch.py:567 in simple_launcher                             │
│                                                                  │
│    564 │   process = subprocess.Popen(cmd, env=current_env)      │
│    565 │   process.wait()                                        │
│    566 │   if process.returncode != 0:                           │
│ ❱  567 │   │   raise subprocess.CalledProcessError(returncode=pr │
│    568                                                           │
│    569                                                           │
│    570 def multi_gpu_launcher(args):                             │
╰──────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['/usr/bin/python3', 'main.py']' 
returned non-zero exit status 1.

jeffltc commented 1 year ago

Same question here.

jeffltc commented 1 year ago

Not sure if this is related to xformer. I cannnot install the recommended xformer version.

Nigueres commented 1 year ago

note: Some user reports ValueError: fp16 mixed precision requires a GPU is occurred in training. In this case, answer 0 for the 6th question: What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:

(Single GPU with id 0 will be used.)

Maybe this could be helpful?

PIPIPIG233666 commented 1 year ago

first, you need rocm enabled pytorch:


pip uninstall torch
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/rocm5.4.2

second, there has to a rocm enabled xformers: yet TBD, in other words impossible: see https://github.com/facebookresearch/xformers/issues/485

kohya-ss / sd-scripts

Error: fp16 mixed precision requires a GPU; exit status 1 #252