ValueError: num_samples should be a positive integer value, but got num_samples=0

JanineCHEN commented 4 years ago

When executing python ./train.py -d 0,1 --identifier su3 config/su3.yaml, I got the following issue:

{   'io': {   'augmentation_level': 2,
              'datadir': 'data/su3/',
              'dataset': 'Wireframe',
              'focal_length': 2.1875,
              'logdir': 'logs/',
              'num_vpts': 3,
              'num_workers': 6,
              'resume_from': None,
              'tensorboard_port': 0,
              'validation_debug': 120,
              'validation_interval': 24000},
    'model': {   'backbone': 'stacked_hourglass',
                 'batch_size': 6,
                 'conic_6x': False,
                 'depth': 4,
                 'fc_channel': 1024,
                 'im2col_step': 11,
                 'multires': <BoxList: [0.0013457768043554, 0.0051941870036646, 0.02004838034795, 0.0774278195486317, 0.299564810864565]>,
                 'num_blocks': 1,
                 'num_stacks': 1,
                 'output_stride': 4,
                 'smp_multiplier': 2,
                 'smp_neg': 1,
                 'smp_pos': 1,
                 'smp_rnd': 3,
                 'upsample_scale': 1},
    'optim': {   'amsgrad': True,
                 'lr': 0.0001,
                 'lr_decay_epoch': 24,
                 'max_epoch': 36,
                 'name': 'Adam',
                 'weight_decay': 1e-05}}
Let's use 1 GPU(s)!
ntrain: 0
Traceback (most recent call last):
  File "./train.py", line 183, in <module>
    main()
  File "./train.py", line 106, in main
    train_loader = torch.utils.data.DataLoader(
  File "/home/anaconda3/envs/lcnn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 213, in __init__
    sampler = RandomSampler(dataset)
  File "/home/anaconda3/envs/lcnn/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 93, in __init__
    raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

Any help with this would be highly appreciated!

JanineCHEN commented 4 years ago

BTW, I also tried ython ./train.py -d 0 --identifier su3 config/su3.yaml since I only got one GPU, got the same error msg, not sure if the number of GPUs has any thing to do with the error?

zhou13 commented 4 years ago

ntrain: 0 normally happenes when PyTorch could not find the data (images) to read. Could you double check that?

JanineCHEN commented 4 years ago

ntrain: 0 normally happenes when PyTorch could not find the data (images) to read. Could you double check that?

Hey, thanks for your prompt response. It was indeed a data path mishap. Sorry for the caused trouble.

Can I just ask one more question before close the issue. I got RuntimeError: CUDA out of memory. afterwards. I am wondering what might be the minimum requirement of the GPU/GPUs for training and evaluation respectively? Thank you.

zhou13 commented 4 years ago

The default hyperparameter is set for a GTX 1080Ti or a GTX 2080 Ti with around 12G memory. If you only have GPUs with less memory, you can try to reduce the batch size. But the reproducibility might vary.

JanineCHEN commented 4 years ago

The default hyperparameter is set for a GTX 1080Ti or a GTX 2080 Ti with around 12G memory. If you only have GPUs with less memory, you can try to reduce the batch size. But the reproducibility might vary.

Thanks for the information!

zhou13 / neurvps

ValueError: num_samples should be a positive integer value, but got num_samples=0 #8