16GB is not enough? - Githubissues

wpl427 commented 10 months ago

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 22.00 MiB (GPU 0; 14.61 GiB total capacity; 13.30 GiB already allocated; 9.19 MiB free; 13.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

What is the minimum video memory??

wpl427 commented 10 months ago

Or how to reduce video memory usage by modifying the configuration?

zideliu commented 10 months ago

you can try this open_clip.create_model_and_transforms('ViT-bigG-14', 'laion2b_s39b_b160k',precision='fp16') in train_t2i_custom_v2.py

wpl427 commented 10 months ago

before: promptmodel,,_ = open_clip.create_model_and_transforms('ViT-bigG-14', 'laion2b_s39b_b160k') after: promptmodel,,_ = open_clip.create_model_and_transforms('ViT-bigG-14', 'laion2b_s39b_b160k',precision='fp16') error: (faceswap) [root@prod-emr-gpu01 StyleDrop-PyTorch]# accelerate launch --num_processes 8 --mixed_precision fp16 train_t2i_custom_v2.py --config=configs/custom.py 2023-08-21 13:16:44.463755: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-08-21 13:16:45.329695: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT The following values were not passed to accelerate launch and had defaults used instead: --num_machines was set to a value of 1 --num_cpu_threads_per_process was set to 8 to improve out-of-box performance To avoid this warning pass in values for each of the problematic parameters or run accelerate config. 2023-08-21 13:16:49.788896: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2023-08-21 13:16:52.551 | INFO | main:train:63 - Process 0 using device: cuda I0821 13:16:52.551910 140480274581312 factory.py:158] Loaded ViT-bigG-14 model config. 2023-08-21 13:16:52.578 | DEBUG | open_clip.transformer:init:314 - xattn in transformer of CLIP is True 2023-08-21 13:17:09.847 | DEBUG | open_clip.transformer:init:314 - xattn in transformer of CLIP is True I0821 13:17:20.080452 140480274581312 factory.py:206] Loading pretrained ViT-bigG-14 weights (laion2b_s39b_b160k). Traceback (most recent call last): File "/data/miniconda3/envs/faceswap/bin/accelerate", line 8, in sys.exit(main()) File "/data/miniconda3/envs/faceswap/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/data/miniconda3/envs/faceswap/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/data/miniconda3/envs/faceswap/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/data/miniconda3/envs/faceswap/bin/python', 'train_t2i_custom_v2.py', '--config=configs/custom.py']' died with <Signals.SIGKILL: 9>.

zideliu / StyleDrop-PyTorch

16GB is not enough? #18