CUDA out of memory,The T4 processor cannot be trained and keeps showing no memory. In actual testing, it can be trained on a local computer with 8GB of video memory. Why is this

wzgrx commented 4 days ago

Traceback (most recent call last): File "/content/kohya-trainer/train_network_xl_wrapper.py", line 10, in trainer.train(args) File "/content/kohya-trainer/train_network.py", line 251, in train train_dataset_group.cache_latents(vae, args.vae_batch_size, args.cache_latents_to_disk, accelerator.is_main_process) File "/content/kohya-trainer/library/train_util.py", line 1823, in cache_latents dataset.cache_latents(vae, vae_batch_size, cache_to_disk, is_main_process) File "/content/kohya-trainer/library/train_util.py", line 872, in cache_latents cache_batch_latents(vae, cache_to_disk, batch, subset.flip_aug, subset.random_crop) File "/content/kohya-trainer/library/train_util.py", line 2147, in cache_batch_latents latents = vae.encode(img_tensors).latent_dist.sample().to("cpu") File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper return method(self, *args, kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 236, in encode h = self.encoder(x) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 139, in forward sample = down_block(sample) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/models/unet_2d_blocks.py", line 1150, in forward hidden_states = resnet(hidden_states, temb=None) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/models/resnet.py", line 598, in forward hidden_states = self.nonlinearity(hidden_states) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/activation.py", line 405, in forward return F.silu(input, inplace=self.inplace) File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2105, in silu return torch._C._nn.silu(input) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 253.06 MiB is free. Process 16410 has 14.50 GiB memory in use. Of the allocated memory 14.08 GiB is allocated by PyTorch, and 312.76 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

wzgrx commented 4 days ago

File "", line 235 get_ipython().system('pip install pytorch-lightning==1.9.0 voluptuous==0.13.1 toml==0.10.2 ftfy==6.1.1 einops==0.6.0 safetensors pygments') ^ IndentationError: unexpected indent

uYouUs commented 3 days ago

Your first error is probably either too many images or the batch size is too high. Second error, you manually changed the code from the original and put something that doesn't belong.

wzgrx commented 3 days ago

Your first error is probably either too many images or the batch size is too high. Second error, you manually changed the code from the original and put something that doesn't belong.

It's obvious that it's the latest official code, I haven't changed anything, and BS is 1, the lowest, obviously there's a problem with the code

uYouUs commented 3 days ago

How many images are in your dataset?

wzgrx commented 3 days ago

How many images are in your dataset?

50 768x768

reo224 commented 15 hours ago

When I try to access the checkpoint from Gdrive, the same error occurs even with the latest version.

hollowstrawberry / kohya-colab

CUDA out of memory,The T4 processor cannot be trained and keeps showing no memory. In actual testing, it can be trained on a local computer with 8GB of video memory. Why is this #213