Closed FurkanGozukara closed 7 months ago
Logging with TensorBoard writes data every step, and the error said Disk quota exceeded
, so even the disk has free space, some quota might be applied for output to the disk. Unfortunately there is no option to disable stepwise log, please disable logging (removing logging_dir
option).
Logging with TensorBoard writes data every step, and the error said
Disk quota exceeded
, so even the disk has free space, some quota might be applied for output to the disk. Unfortunately there is no option to disable stepwise log, please disable logging (removinglogging_dir
option).
thanks will remember this.
I was doing a training over 21 hours on RunPod - 21:52:51
Training got cancelled with following error
The pod still have over 100 GB disk space
█████ | 32547/51360 [21:52:50<12:38:51, 2.42s/it, avr_loss=0.0996]Traceback (most recent call last): File "/workspace/kohya_ss/./sdxl_train.py", line 792, in
train(args)
File "/workspace/kohya_ss/./sdxl_train.py", line 657, in train
accelerator.log(logs, step=global_step)
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 619, in _inner
return PartialState().on_main_process(function)(args, kwargs)
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2399, in log
tracker.log(values, step=step, log_kwargs.get(tracker.name, {}))
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/tracking.py", line 79, in execute_on_main_process
return PartialState().on_main_process(function)(self, args, **kwargs)
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/tracking.py", line 247, in log
self.writer.flush()
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py", line 1200, in flush
writer.flush()
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py", line 150, in flush
self.event_writer.flush()
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/tensorboard/summary/writer/event_file_writer.py", line 125, in flush
self._async_writer.flush()
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/tensorboard/summary/writer/event_file_writer.py", line 190, in flush
self._writer.flush()
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/tensorboard/summary/writer/record_writer.py", line 43, in flush
self._writer.flush()
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/tensorflow/python/lib/io/file_io.py", line 221, in flush
self._writable_file.flush()
tensorflow.python.framework.errors_impl.ResourceExhaustedError: /workspace/stable-diffusion-webui/models/Stable-diffusion/sdxl_1_fp32/log/20240306030504/finetuning/events.out.tfevents.1709694327.7f4fcd28f189.2380.0; Disk quota exceeded
steps: 63%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 32547/51360 [21:52:51<12:38:51, 2.42s/it, avr_loss=0.0996]
Traceback (most recent call last):
File "/workspace/kohya_ss/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
simple_launcher(args)
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/workspace/kohya_ss/venv/bin/python', './sdxl_train.py',