iperov / DeepFaceLab

DeepFaceLab is the leading software for creating deepfakes.
GNU General Public License v3.0
46.27k stars 10.38k forks source link

Error: OOM when allocating tensor of shape PLEASE HELP! #5743

Open ismailyugirov opened 8 months ago

ismailyugirov commented 8 months ago

**Hi. I constantly encounter this error while training with Train SAEHD. Even if I lower the resolution or batch size, I continue to get errors. How can I solve the problem? Please help me.

My system:

11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz RTX 3060 16GB RAM**

Initializing models: 80%|##################################################4 | 4/5 [00:11<00:02, 2.94s/it] asdfg [163840,300] and type float [[node src_dst_opt/vs_inter_B/dense1/weight_0/Initializer/Const (defined at C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:38) ]]

Original stack trace for 'src_dst_opt/vs_inter_B/dense1/weight_0/Initializer/Const': File "threading.py", line 884, in _bootstrap File "threading.py", line 916, in _bootstrap_inner File "threading.py", line 864, in run File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread debug=debug) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py", line 193, in init self.on_initialize() File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 341, in on_initialize self.src_dst_opt.initialize_variables (self.src_dst_saveable_weights, vars_on_cpu=optimizer_vars_on_cpu, lr_dropout_on_cpu=self.options['lr_dropout']=='cpu') File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 38, in initialize_variables vs = { v.name : tf.getvariable ( f'vs{v.name}'.replace(':','_'), v.shape, dtype=v.dtype, initializer=tf.initializers.constant(0.0), trainable=False) for v in trainable_weights } File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 38, in vs = { v.name : tf.getvariable ( f'vs{v.name}'.replace(':','_'), v.shape, dtype=v.dtype, initializer=tf.initializers.constant(0.0), trainable=False) for v in trainable_weights } File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1595, in get_variable aggregation=aggregation) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1338, in get_variable aggregation=aggregation) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 593, in get_variable aggregation=aggregation) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 545, in _true_getter aggregation=aggregation) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 963, in _get_single_variable aggregation=aggregation) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 266, in call return cls._variable_v1_call(*args, kwargs) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 227, in _variable_v1_call shape=shape) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 205, in previous_getter = lambda kwargs: default_variable_creator(None, *kwargs) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2642, in default_variable_creator shape=shape) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 270, in call return super(VariableMetaclass, cls).call(args, **kwargs) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1670, in init shape=shape) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1799, in _init_from_args initial_value = initial_value() File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\init_ops.py", line 230, in call self.value, dtype=dtype, shape=shape, verify_shape=verify_shape) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\constant_op.py", line 171, in constant_v1 allow_broadcast=False) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\constant_op.py", line 294, in _constant_impl "Const", [], [dtype_value.type], attrs=attrs, name=name).outputs[0] File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal op_def=op_def) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in init self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Traceback (most recent call last): File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1375, in _do_call return fn(*args) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn target_list, run_metadata) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor of shape [163840,300] and type float [[{{node src_dst_opt/vs_inter_B/dense1/weight_0/Initializer/Const}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread debug=debug) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py", line 193, in init self.on_initialize() File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 657, in on_initialize model.init_weights() File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\layers\Saveable.py", line 106, in init_weights nn.init_weights(self.get_weights()) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\ops__init__.py", line 48, in init_weights nn.tf_sess.run (ops) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 968, in run run_metadata_ptr) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1191, in _run feed_dict_tensor, options, run_metadata) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1369, in _do_run run_metadata) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1394, in _do_call raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor of shape [163840,300] and type float [[node src_dst_opt/vs_inter_B/dense1/weight_0/Initializer/Const (defined at C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:38) ]]

Original stack trace for 'src_dst_opt/vs_inter_B/dense1/weight_0/Initializer/Const': File "threading.py", line 884, in _bootstrap File "threading.py", line 916, in _bootstrap_inner File "threading.py", line 864, in run File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread debug=debug) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py", line 193, in init self.on_initialize() File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 341, in on_initialize self.src_dst_opt.initialize_variables (self.src_dst_saveable_weights, vars_on_cpu=optimizer_vars_on_cpu, lr_dropout_on_cpu=self.options['lr_dropout']=='cpu') File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 38, in initialize_variables vs = { v.name : tf.getvariable ( f'vs{v.name}'.replace(':','_'), v.shape, dtype=v.dtype, initializer=tf.initializers.constant(0.0), trainable=False) for v in trainable_weights } File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 38, in vs = { v.name : tf.getvariable ( f'vs{v.name}'.replace(':','_'), v.shape, dtype=v.dtype, initializer=tf.initializers.constant(0.0), trainable=False) for v in trainable_weights } File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1595, in get_variable aggregation=aggregation) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1338, in get_variable aggregation=aggregation) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 593, in get_variable aggregation=aggregation) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 545, in _true_getter aggregation=aggregation) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 963, in _get_single_variable aggregation=aggregation) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 266, in call return cls._variable_v1_call(*args, kwargs) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 227, in _variable_v1_call shape=shape) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 205, in previous_getter = lambda kwargs: default_variable_creator(None, *kwargs) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2642, in default_variable_creator shape=shape) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 270, in call return super(VariableMetaclass, cls).call(args, **kwargs) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1670, in init shape=shape) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1799, in _init_from_args initial_value = initial_value() File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\init_ops.py", line 230, in call self.value, dtype=dtype, shape=shape, verify_shape=verify_shape) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\constant_op.py", line 171, in constant_v1 allow_broadcast=False) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\constant_op.py", line 294, in _constant_impl "Const", [], [dtype_value.type], attrs=attrs, name=name).outputs[0] File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal op_def=op_def) File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in init self._traceback = tf_stack.extract_stack_for_node(self._c_op)