victorca25 / traiNNer

traiNNer: Deep learning framework for image and video super-resolution, restoration and image-to-image translation, for training and testing.
Apache License 2.0
293 stars 39 forks source link

Correct usage of lmdb #37

Closed N0manDemo closed 3 years ago

N0manDemo commented 3 years ago

I used create_lmdb.py to create both my LR and HR datasets, and I was wondering how I should configure my options file. Do the settings differ from using HR/LR image folders?

victorca25 commented 3 years ago

Hello! Technically you only need to point the dataroot_HR and dataroot_LR to the correct directories ending in '.lmdb' and they should be loaded correctly. I haven't used lmdb in a while, so let me know how it goes!

victorca25 commented 3 years ago

The directories should look like: train_HR.lmdb ├── data.mdb ├── lock.mdb ├── meta_info.txt

N0manDemo commented 3 years ago

Hi victorca25,

I receive this error when loading images from my lmdb directory.

My Log File error_lmdb.log

My Config File:

train_esrgan.txt

21-02-12 17:31:47.474 - INFO: Random seed: 0 21-02-12 17:31:47.479 - INFO: Read lmdb keys from cache: ../../datasets/main/hr.lmdb/_keys_cache.p 21-02-12 17:31:47.479 - INFO: Dataset [LRHRDataset - DIV2K] is created. 21-02-12 17:31:47.479 - INFO: Number of train images: 44, iters: 6 21-02-12 17:31:47.479 - INFO: Total epochs needed: 83334 for iters 500,000 21-02-12 17:31:47.479 - INFO: Read lmdb keys from cache: ../../datasets/main/val/hr.lmdb/_keys_cache.p 21-02-12 17:31:47.479 - INFO: Dataset [LRHRDataset - val_set14_part] is created. 21-02-12 17:31:47.479 - INFO: Number of val images in [val_set14_part]: 44 21-02-12 17:31:47.631 - INFO: AMP library available 21-02-12 17:31:48.803 - INFO: Initialization method [kaiming] 21-02-12 17:31:49.020 - INFO: Initialization method [kaiming] 21-02-12 17:31:49.931 - INFO: AMP enabled 21-02-12 17:31:49.939 - INFO: Network G structure: DataParallel - RRDBNet, with parameters: 16,697,987 21-02-12 17:31:49.939 - INFO: Network D structure: DataParallel - Discriminator_VGG, with parameters: 14,502,281 21-02-12 17:31:49.939 - INFO: Model [SRRaGANModel] is created. 21-02-12 17:31:49.939 - INFO: Start training from epoch: 0, iter: 0 Traceback (most recent call last): File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 416, in main() File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 412, in main fit(model, opt, dataloaders, steps_states, data_params, loggers) File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 215, in fit for n, train_data in enumerate(dataloaders['train'], start=1): File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data return self._process_data(data) File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data data.reraise() File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) AttributeError: Caught AttributeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/mnt/ext4-storage/Training/BasicSR/codes/data/LRHRC_dataset.py", line 224, in getitem img_HR = util.read_img(self.HR_env, HR_path, out_nc=image_channels) File "/mnt/ext4-storage/Training/BasicSR/codes/dataops/common.py", line 129, in read_img img = fix_img_channels(img, out_nc) File "/mnt/ext4-storage/Training/BasicSR/codes/dataops/common.py", line 139, in fix_img_channels if img.ndim == 2: AttributeError: 'NoneType' object has no attribute 'ndim'

victorca25 commented 3 years ago

Can you try adding in line 100 here: https://github.com/victorca25/BasicSR/blob/master/codes/dataops/common.py

print("env: ", env)

And let me know what it prints in the console?

N0manDemo commented 3 years ago

21-02-13 13:08:32.573 - INFO: Random seed: 0 21-02-13 13:08:32.594 - INFO: Read lmdb keys from cache: ../../datasets/main/hr.lmdb/_keys_cache.p 21-02-13 13:08:32.595 - INFO: Dataset [LRHRDataset - DIV2K] is created. 21-02-13 13:08:32.595 - INFO: Number of train images: 44, iters: 6 21-02-13 13:08:32.595 - INFO: Total epochs needed: 83334 for iters 500,000 21-02-13 13:08:32.596 - INFO: Read lmdb keys from cache: ../../datasets/main/val/hr.lmdb/_keys_cache.p 21-02-13 13:08:32.597 - INFO: Dataset [LRHRDataset - val_set14_part] is created. 21-02-13 13:08:32.597 - INFO: Number of val images in [val_set14_part]: 44 21-02-13 13:08:33.009 - INFO: AMP library available 21-02-13 13:08:36.369 - INFO: Initialization method [kaiming] 21-02-13 13:08:36.587 - INFO: Initialization method [kaiming] 21-02-13 13:08:38.641 - INFO: AMP enabled 21-02-13 13:08:38.648 - INFO: Network G structure: DataParallel - RRDBNet, with parameters: 16,697,987 21-02-13 13:08:38.649 - INFO: Network D structure: DataParallel - Discriminator_VGG, with parameters: 14,502,281 21-02-13 13:08:38.649 - INFO: Model [SRRaGANModel] is created. 21-02-13 13:08:38.649 - INFO: Start training from epoch: 0, iter: 0 env: None env: None env: None env: None env: None Traceback (most recent call last): File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 416, in main() File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 412, in main fit(model, opt, dataloaders, steps_states, data_params, loggers) File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 215, in fit for n, train_data in enumerate(dataloaders['train'], start=1): File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data return self._process_data(data) File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data data.reraise() File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) AttributeError: Caught AttributeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/mnt/ext4-storage/Training/BasicSR/codes/data/LRHRC_dataset.py", line 224, in getitem img_HR = util.read_img(self.HR_env, HR_path, out_nc=image_channels) File "/mnt/ext4-storage/Training/BasicSR/codes/dataops/common.py", line 129, in read_img img = fix_img_channels(img, out_nc) File "/mnt/ext4-storage/Training/BasicSR/codes/dataops/common.py", line 139, in fix_img_channels if img.ndim == 2: AttributeError: 'NoneType' object has no attribute 'ndim'

(main) [n0man@fedora-desktop-n0man codes]$

victorca25 commented 3 years ago

Ok, so the problem is that for some reason the enviroment variables for lmdb are not being correctly passed to the read function:


env: None
env: None
env: None
env: None```

I'm working on something else at the moment, but I'll try to take a look to see if I find where the issue is.
victorca25 commented 3 years ago

I may have found a solution, but it will take a while to commit, since I have been modifying the dataloaders and they are not in a state I can commit at the moment

N0manDemo commented 3 years ago

Okay, I’m working on a project right now that doesn’t require lmdb support, so feel free to put this bug on hold.

Thank you for your help.

N0man

On Mon, Feb 15, 2021 at 12:46 PM victorca25 notifications@github.com wrote:

I may have found a solution, but it will take a while to commit, since I have been modifying the dataloaders and they are not in a state I can commit at the moment

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/victorca25/BasicSR/issues/37#issuecomment-779371105, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASTRQRNXCM45L42WP23YZCTS7FMYJANCNFSM4XPLDDAQ .

victorca25 commented 3 years ago

Awesome! It won't be on hold, I just need one or two more days to finish testing it and modifying the dataloaders to a state that can be committed, I'll let you know when it's up

victorca25 commented 3 years ago

@N0manDemo the updated datasets and lmdb codes have now been commited. Please refer to the wiki for more details about the updated lmdb, you will have to recreate the database with the script, but it should work much better now

N0manDemo commented 3 years ago

Thank you.

lmdb is working now.