zudi-lin / rcan-it

Revisiting RCAN: Improved Training for Image Super-Resolution
MIT License
91 stars 16 forks source link

How to train this model on own dataset? #3

Open Techzist16 opened 2 years ago

Techzist16 commented 2 years ago

We need to train this model on our own dataset. Plesae help us this regard.

zudi-lin commented 2 years ago

Hi @TriB90, happy to help! Our training data has a folder structure like this:

DF2K/
  bin/
    train_bin_HR.pt
    train_bin_LR_X2.pt
    train_bin_LR_X3.pt
    train_bin_LR_X4.pt
  DF2K_HR/
    0001.png
    ...
  DF2K_LR_bicubic/
    x2/
    x3/
    x4/

The folders DF2K_HR and DF2K_LR_bicubic contain high-resolution and downsampled PNG images, respectively. When you run training, you need to specify the config DATASET.DATA_DIR = "DF2K". We usually save the images as a single binary file in bin/ and load them directly into memory to save I/O time. To use the binary files, set DATASET.DATA_EXT = "bin".

If this does not address your question, please let me know your data format, and we can discuss more.

Techzist16 commented 2 years ago

@zudi-lin First, I want to try your pre-trained model on different test images. But I could not able to run it as it is not clearly written in Readme file how to run the test code. Please help me in this regard.

Techzist16 commented 2 years ago

DATASET.DATA_EXT = "bin"

Not able to create the binary file. I have kept 18 HR images in 'rcan-it/ptsr/datasets/SR/BIX2X3X4/DF2K/DF2K_train_HR/' folder and their corresponding LR images in 'rcan-it/ptsr/datasets/SR/BIX2X3X4/DF2K/DF2K_train_LR_bicubic/X2' folder. But the error is shown below: `Total number of parameters: 15444667 Using Dataset(s): DF2K for training

/content/drive/MyDrive/rcan-it/ptsr/datasets/SR/BIX2X3X4/DF2K/bin/train_bin_HR.pt does not exist. Now making binary... Bin pt file with name and image Traceback (most recent call last): File "main.py", line 135, in main() File "main.py", line 98, in main loader = Data(cfg) File "/content/drive/MyDrive/rcan-it/ptsr/data/init.py", line 35, in init datasets.append(getattr(m, module_name)(cfg, name=d)) File "/content/drive/MyDrive/rcan-it/ptsr/data/df2k.py", line 17, in init cfg, name=name, train=train, benchmark=benchmark File "/content/drive/MyDrive/rcan-it/ptsr/data/srdata.py", line 39, in init cfg.DATASET.DATA_EXT, list_hr, self._name_hrbin() File "/content/drive/MyDrive/rcan-it/ptsr/data/srdata.py", line 148, in _check_and_load } for _l in l] File "/content/drive/MyDrive/rcan-it/ptsr/data/srdata.py", line 148, in } for _l in l] File "/usr/local/lib/python3.7/dist-packages/imageio/core/functions.py", line 221, in imread reader = read(uri, format, "i", kwargs) File "/usr/local/lib/python3.7/dist-packages/imageio/core/functions.py", line 130, in get_reader request = Request(uri, "r" + mode, kwargs) File "/usr/local/lib/python3.7/dist-packages/imageio/core/request.py", line 125, in init self._parse_uri(uri) File "/usr/local/lib/python3.7/dist-packages/imageio/core/request.py", line 273, in _parse_uri raise FileNotFoundError("No such file: '%s'" % fn) FileNotFoundError: No such file: '/content/drive/MyDrive/rcan-it/ptsr/datasets/SR/BIX2X3X4/DF2K/DF2K_train_HR/0019.png' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1678) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 723, in main() File "/usr/local/lib/python3.7/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper return f(*args, *kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 719, in main run(args) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 713, in run )(cmd_args) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 261, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:`

Please help me to solve this issue @zudi-lin.

zudi-lin commented 2 years ago

@TriB90 Guidance for training on a custom dataset is posted here: https://github.com/zudi-lin/rcan-it#custom_data