I am trying to run the training script on colab. I used
!pip install torch torchvision torchaudio compressai==1.2.0 einops timm pillow==10.0.0
to set up the environment and then downloaded and rearranged the Kodak dataset as per compressAI's format.
When I try to run
!CUDA_VISIBLE_DEVICES='0' python train.py -d data/ --cuda --N 128 --lambda 0.05 --epochs 50 --num-workers 1 --lr_epoch 45 48 --save_path ./pretrained --save
I get the following output:
2023-08-03 15:23:43.125043: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-08-03 15:23:44.183693: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
model : bmshj2018-factorized
dataset : data/
epochs : 50
learning_rate : 0.0001
num_workers : 1
lmbda : 0.05
batch_size : 8
test_batch_size : 8
aux_learning_rate : 0.001
patch_size : (256, 256)
cuda : True
save : True
seed : 100
clip_max_norm : 1.0
checkpoint : None
type : mse
save_path : ./pretrained
skip_epoch : 0
N : 128
lr_epoch : [45, 48]
continue_train : True
cuda
milestones: [45, 48]
Learning rate: 0.0001
Traceback (most recent call last):
File "/content/LIC_TCM/train.py", line 426, in
main(sys.argv[1:])
File "/content/LIC_TCM/train.py", line 391, in main
train_one_epoch(
File "/content/LIC_TCM/train.py", line 121, in train_one_epoch
for i, d in enumerate(train_dataloader):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 644, in reraise
raise exception
PIL.UnidentifiedImageError: Caught UnidentifiedImageError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/compressai/datasets/image.py", line 75, in getitem
img = Image.open(self.samples[index]).convert("RGB")
File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 3280, in open
raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file '/content/LIC_TCM/data/train/img003.png'
I checked the image, it is not corrupt nor 0 bytes. Can you please give me some inputs on what might be the cause of this issue?
Thanks for sharing your implementation.
I am trying to run the training script on colab. I used !pip install torch torchvision torchaudio compressai==1.2.0 einops timm pillow==10.0.0 to set up the environment and then downloaded and rearranged the Kodak dataset as per compressAI's format.
When I try to run !CUDA_VISIBLE_DEVICES='0' python train.py -d data/ --cuda --N 128 --lambda 0.05 --epochs 50 --num-workers 1 --lr_epoch 45 48 --save_path ./pretrained --save
I get the following output: 2023-08-03 15:23:43.125043: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-08-03 15:23:44.183693: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT model : bmshj2018-factorized dataset : data/ epochs : 50 learning_rate : 0.0001 num_workers : 1 lmbda : 0.05 batch_size : 8 test_batch_size : 8 aux_learning_rate : 0.001 patch_size : (256, 256) cuda : True save : True seed : 100 clip_max_norm : 1.0 checkpoint : None type : mse save_path : ./pretrained skip_epoch : 0 N : 128 lr_epoch : [45, 48] continue_train : True cuda milestones: [45, 48] Learning rate: 0.0001 Traceback (most recent call last): File "/content/LIC_TCM/train.py", line 426, in
main(sys.argv[1:])
File "/content/LIC_TCM/train.py", line 391, in main
train_one_epoch(
File "/content/LIC_TCM/train.py", line 121, in train_one_epoch
for i, d in enumerate(train_dataloader):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 644, in reraise
raise exception
PIL.UnidentifiedImageError: Caught UnidentifiedImageError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/compressai/datasets/image.py", line 75, in getitem
img = Image.open(self.samples[index]).convert("RGB")
File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 3280, in open
raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file '/content/LIC_TCM/data/train/img003.png'
I checked the image, it is not corrupt nor 0 bytes. Can you please give me some inputs on what might be the cause of this issue?
Thanks in advance.