Cannot train the model - Githubissues

researchmm / AOT-GAN-for-Inpainting

[TVCG'2023] AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

Apache License 2.0

400 stars 64 forks source link

got the following error: 2024-03-10 10:12:48.171105: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-03-10 10:12:48.171155: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-03-10 10:12:48.172506: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-03-10 10:12:49.612674: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [**] create folder ../experiments/aotgan_places2_pconv256 Traceback (most recent call last): File "/content/drive/MyDrive/CODES/AOTGAN/src/train.py", line 51, in main_worker(0, 1, args) File "/content/drive/MyDrive/CODES/AOTGAN/src/train.py", line 30, in main_worker trainer = Trainer(args) File "/content/drive/MyDrive/CODES/AOTGAN/src/trainer/trainer.py", line 23, in init self.dataloader = create_loader(args) File "/content/drive/MyDrive/CODES/AOTGAN/src/data/init.py", line 14, in create_loader data_loader = DataLoader( File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 349, in init sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type] File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py", line 140, in init raise ValueError(f"num_samples should be a positive integer value, but got num_samples={self.num_samples}") ValueError: num_samples should be a positive integer value, but got num_samples=0

Hi, you have to specify the --data_train to point to the directories storing your images and put your masks into "pconv" folder. (I know, the naming is confusing). Below is how the data is loaded. https://github.com/researchmm/AOT-GAN-for-Inpainting/blob/418034627392289bdfc118d62bc49e6abd3bb185/src/data/dataset.py#L21C2-L24C84

self.image_path = []
for ext in ['*.jpg', '*.png']: 
    self.image_path.extend(glob(os.path.join(args.dir_image, args.data_train, ext)))
self.mask_path = glob(os.path.join(args.dir_mask, args.mask_type, '*.png'))

https://github.com/researchmm/AOT-GAN-for-Inpainting/blob/418034627392289bdfc118d62bc49e6abd3bb185/src/data/dataset.py#L48C1-L55C51

if self.mask_type == 'pconv':
    index = np.random.randint(0, len(self.mask_path))
    mask = Image.open(self.mask_path[index])
    mask = mask.convert('L')
else:
    mask = np.zeros((self.h, self.w)).astype(np.uint8)
    mask[self.h//4:self.h//4*3, self.w//4:self.w//4*3] = 1
    mask = Image.fromarray(m).convert('L')

For example, if you put the images into ./data/images and the masks into ./data/pconv, the expected command would be:

python train.py --dir_image ./data --dir_mask ./data --data_train images --mask_type pconv --image_size 256 --save_every 5000

researchmm / AOT-GAN-for-Inpainting

Cannot train the model #13