mitko00 commented 2 years ago

Hello! I have a gpu mx150 and i can't use it. I have to do cpu, but it is slow. Here is the thing in the sae hd bat Running trainer.

Choose one of saved models, or enter a name to create a new model. [r] : rename [d] : delete

[0] : capsnaps - latest : 0 0 Loading capsnaps_SAEHD model...

Choose one or several GPU idxs (separated by comma).

[CPU] : CPU [0] : NVIDIA GeForce MX150 [1] : Intel(R) UHD Graphics 620

[1] Which GPU indexes to choose? : 0 0

Initializing models: 100%|###############################################################| 5/5 [00:43<00:00, 8.73s/it] Loading samples: 100%|##############################################################| 768/768 [00:05<00:00, 132.88it/s] Loading samples: 100%|##############################################################| 709/709 [00:06<00:00, 106.63it/s] ================= Model Summary ================= == == == Model name: capsnaps_SAEHD == == == == Current iteration: 4084 == == == ==--------------- Model Options ---------------== == == == resolution: 128 == == face_type: head == == models_opt_on_gpu: True == == archi: liae == == ae_dims: 256 == == e_dims: 64 == == d_dims: 64 == == d_mask_dims: 22 == == masked_training: True == == eyes_mouth_prio: True == == uniform_yaw: False == == blur_out_mask: False == == adabelief: True == == lr_dropout: n == == random_warp: True == == true_face_power: 0.0 == == face_style_power: 0.0 == == bg_style_power: 0.0 == == ct_mode: none == == clipgrad: False == == pretrain: False == == autobackup_hour: 0 == == write_preview_history: True == == target_iter: 10000 == == random_src_flip: False == == random_dst_flip: False == == batch_size: 4 == == gan_power: 0.0 == == gan_patch_size: 16 == == gan_dims: 16 == == == ==---------------- Running On -----------------== == == == Device index: 0 == == Name: NVIDIA GeForce MX150 == == VRAM: 1.43GB == == ==

Starting. Target iteration: 10000. Press "Enter" to stop training and save model. Error: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[32768] and type float on /job:localhost/replica:0/task:0/device:DML:0 by allocator DmlAllocator [[node mul_93 (defined at C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\framework\ops.py:1762) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[concat_4/concat/_681]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[32768] and type float on /job:localhost/replica:0/task:0/device:DML:0 by allocator DmlAllocator [[node mul_93 (defined at C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\framework\ops.py:1762) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations. 0 derived errors ignored.

Original stack trace for 'mul_93': File "threading.py", line 884, in _bootstrap File "threading.py", line 916, in _bootstrap_inner File "threading.py", line 864, in run File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread debug=debug) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\DeepFaceLab\models\ModelBase.py", line 193, in init self.on_initialize() File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 557, in on_initialize src_dst_loss_gv_op = self.src_dst_opt.get_update_op (nn.average_gv_list (gpu_G_loss_gvs)) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 63, in get_update_op v_t = self.beta_2vs + (1.0-self.beta_2) tf.square(g-m_t) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\ops\variables.py", line 1079, in _run_op return tensor_oper(a.value(), *args, *kwargs) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\ops\math_ops.py", line 925, in r_binary_op_wrapper return func(x, y, name=name) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\ops\math_ops.py", line 1206, in _mul_dispatch return gen_math_ops.mul(x, y, name=name) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\ops\gen_math_ops.py", line 7196, in mul "Mul", x=x, y=y, name=name) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func return func(args, **kwargs) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3371, in create_op attrs, op_def, compute_device) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3440, in _create_op_internal op_def=op_def) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1762, in init self._traceback = tf_stack.extract_stack()

Traceback (most recent call last): File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call return fn(*args) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn target_list, run_metadata) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[32768] and type float on /job:localhost/replica:0/task:0/device:DML:0 by allocator DmlAllocator [[{{node mul_93}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[concat_4/concat/_681]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[32768] and type float on /job:localhost/replica:0/task:0/device:DML:0 by allocator DmlAllocator [[{{node mul_93}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations. 0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\DeepFaceLab\mainscripts\Trainer.py", line 129, in trainerThread iter, iter_time = model.train_one_iter() File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\DeepFaceLab\models\ModelBase.py", line 474, in train_one_iter losses = self.onTrainOneIter() File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 772, in onTrainOneIter src_loss, dst_loss = self.src_dst_train (warped_src, target_src, target_srcm, target_srcm_em, warped_dst, target_dst, target_dstm, target_dstm_em) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 577, in src_dst_train self.target_dstm_em:target_dstm_em, File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run run_metadata_ptr) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run run_metadata) File "C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[32768] and type float on /job:localhost/replica:0/task:0/device:DML:0 by allocator DmlAllocator [[node mul_93 (defined at C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\framework\ops.py:1762) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[concat_4/concat/_681]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[32768] and type float on /job:localhost/replica:0/task:0/device:DML:0 by allocator DmlAllocator [[node mul_93 (defined at C:\Users\Mitko\Downloads\DeepFaceLab_DirectX12_internal\python-3.6.8\lib\site-packages\tensorflow_core\python\framework\ops.py:1762) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations. 0 derived errors ignored.

exploreTech32 commented 2 years ago

To me it looks like your GPU does not have enough VRAM and that is why you see the resource exhausted issue. I suggest you try DFL-Colab

141801 commented 2 years ago

how much VRAM is needed ? Can we change it by modify any config file ? I use gtx1650 but also can't use gpu

exploreTech32 commented 2 years ago

The amount of VRAM required is based on the model setting. SAEHD models need more VRAM based on resolution, batch size and other settings. You cannot increase your VRAM availability. You can always reduce your model setting to work lesser resource.

joolstorrentecalo commented 1 year ago

Issue solved / already answered (or it seems like user error), please close it.

iperov / DeepFaceLab

Can't use gpu #5406