jmliu206 / LIC_TCM

MIT License
161 stars 24 forks source link

PIL issue #10

Open manyaafonsoWUR opened 1 year ago

manyaafonsoWUR commented 1 year ago

Thanks for sharing your implementation.

I am trying to run the training script on colab. I used !pip install torch torchvision torchaudio compressai==1.2.0 einops timm pillow==10.0.0 to set up the environment and then downloaded and rearranged the Kodak dataset as per compressAI's format.

When I try to run !CUDA_VISIBLE_DEVICES='0' python train.py -d data/ --cuda --N 128 --lambda 0.05 --epochs 50 --num-workers 1 --lr_epoch 45 48 --save_path ./pretrained --save

I get the following output: 2023-08-03 15:23:43.125043: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-08-03 15:23:44.183693: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT model : bmshj2018-factorized dataset : data/ epochs : 50 learning_rate : 0.0001 num_workers : 1 lmbda : 0.05 batch_size : 8 test_batch_size : 8 aux_learning_rate : 0.001 patch_size : (256, 256) cuda : True save : True seed : 100 clip_max_norm : 1.0 checkpoint : None type : mse save_path : ./pretrained skip_epoch : 0 N : 128 lr_epoch : [45, 48] continue_train : True cuda milestones: [45, 48] Learning rate: 0.0001 Traceback (most recent call last): File "/content/LIC_TCM/train.py", line 426, in main(sys.argv[1:]) File "/content/LIC_TCM/train.py", line 391, in main train_one_epoch( File "/content/LIC_TCM/train.py", line 121, in train_one_epoch for i, d in enumerate(train_dataloader): File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 633, in next data = self._next_data() File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data return self._process_data(data) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data data.reraise() File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 644, in reraise raise exception PIL.UnidentifiedImageError: Caught UnidentifiedImageError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.10/dist-packages/compressai/datasets/image.py", line 75, in getitem img = Image.open(self.samples[index]).convert("RGB") File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 3280, in open raise UnidentifiedImageError(msg) PIL.UnidentifiedImageError: cannot identify image file '/content/LIC_TCM/data/train/img003.png'

I checked the image, it is not corrupt nor 0 bytes. Can you please give me some inputs on what might be the cause of this issue?

Thanks in advance.