tstandley / taskgrouping

Code for Which Tasks Should Be Learned Together in Multi-task Learning?
Other
93 stars 14 forks source link

ValueError: num_samples should be a positive integer value, but got num_samples=0 #3

Open e96031413 opened 3 years ago

e96031413 commented 3 years ago

When I tried to train with the following command

python3 train_taskonomy.py -d=/taskonomy/ -a=xception_taskonomy_new -j 4 -b 96 -lr=.1 --fp16 -sbn --tasks=sdnerac -r

I got the following error:

Traceback (most recent call last):
  File "train_taskonomy.py", line 724, in <module>
    main(args)
  File "train_taskonomy.py", line 223, in main
    num_workers=args.workers, pin_memory=True, sampler=None)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 262, in __init__
    sampler = RandomSampler(dataset, generator=generator)  # type: ignore
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 104, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

After searching the error on Google, I found the problem is due to the wrong dataset path.

I got taskonomy dataset by

git clone https://github.com/alexsax/taskonomy-sample-model-1

rename the folder name from taskonomy-sample-model-1 to taskonomy and move the folder inside taskgrouping folder, so the folder structure showed as below:

├── model_definitions
│   ├── __init__.py
│   ├── ozan_min_norm_solvers.py
│   ├── ozan_rep_fun.py
│   ├── __pycache__
│   │   ├── __init__.cpython-36.pyc
│   │   ├── ozan_min_norm_solvers.cpython-36.pyc
│   │   ├── ozan_rep_fun.cpython-36.pyc
│   │   ├── resnet_taskonomy.cpython-36.pyc
│   │   ├── xception_taskonomy_joined_decoder.cpython-36.pyc
│   │   ├── xception_taskonomy_new.cpython-36.pyc
│   │   └── xception_taskonomy_small.cpython-36.pyc
│   ├── resnet_taskonomy.py
│   ├── xception_taskonomy_joined_decoder.py
│   ├── xception_taskonomy_new.py
│   └── xception_taskonomy_small.py
├── network_selection
│   ├── a.out
│   ├── main
│   ├── main.cpp
│   ├── Makefile
│   ├── make_plots.py
│   ├── plots
│   │   ├── setting_1.pdf
│   │   ├── setting_2.pdf
│   │   ├── setting_3.pdf
│   │   └── setting_4.pdf
│   ├── results_20.txt
│   ├── results_alt_20.txt
│   ├── results_alt_test.txt
│   ├── results_alt.txt
│   ├── results_large_20.txt
│   ├── results_large_test.txt
│   ├── results_large.txt
│   ├── results_mean.txt
│   ├── results_small_data_at4.txt
│   ├── results_small_data_test.txt
│   ├── results_small_data.txt
│   ├── results_test.txt
│   └── results.txt
├── __pycache__
│   ├── taskonomy_loader.cpython-36.pyc
│   └── taskonomy_losses.cpython-36.pyc
├── README.md
├── read_training_history.py
├── saved_models
│   └── placeholder
├── s.txt
├── sync_batchnorm
│   ├── batchnorm.py
│   ├── batchnorm_reimpl.py
│   ├── comm.py
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── batchnorm.cpython-36.pyc
│   │   ├── comm.cpython-36.pyc
│   │   ├── __init__.cpython-36.pyc
│   │   └── replicate.cpython-36.pyc
│   ├── replicate.py
│   └── unittest.py
├── taskonomy                                             --------------------->dataset here
│   ├── rgb
│   │   ├── building
│   │   │   ├── XXXXXXXXXXXX.png

Do I miss any steps? or using the wrong dataset?

tstandley commented 3 years ago

Hi there. Thanks for your interest.

I'm not sure exactly what's going on, but that is the error you get when no dataset is found at that location.

I think the issue might be with your flag -d=/taskonomy/

That is for if the taskonomy directory is under your root directory. If you want it under your project directory, try: -d=taskonomy/

or even

-d="path to project directory"/taskonomy/

Let me know if that doesn't work.

Also, I think the data you get from git clone https://github.com/alexsax/taskonomy-sample-model-1 might need to be formatted/processed. It needs to be downsampled to 256x256, otherwise it will downsample on the fly, which will be slow.

e96031413 commented 3 years ago

Thanks for your reply

Unfortunately, the error still occurs.

I also check the class TaskonomyLoader from taskonomy_loader.py. In line 45, I can understand that when I use the flag -d=/taskonomy/, I assign the root to be /taskonomy/ and with os.path.join(root,'rgb'), we can finally get the path /taskonomy/rgb/

but the error still occurs, it's pretty weird.

By the way, how's your way to download the taskonomy dataset? I only see you wrote Get training data StanfordVL/taskonomy@master/data on README file. So you train with full 12TB dataset?

tstandley commented 3 years ago

Since the path starts with '/' it will be looking for the files in a completely different place. It'll look in your primary drive's mount point. But the files are not there. They are in your project folder. Try removing the leading '/' from the path.

On Fri, Dec 18, 2020, 11:18 PM e96031413 notifications@github.com wrote:

Thanks for your reply

Unfortunately, the error still occurs.

I also check the class TaskonomyLoader from taskonomy_loader.py. In line 45, I can understand that when I use the flag -d=/taskonomy/, I assign the root to be /taskonomy/ and with os.path.join(root,'rgb'), we can finally get the path /taskonomy/rgb/

but the error still occurs, it's pretty weird.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tstandley/taskgrouping/issues/3#issuecomment-748433243, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDABXQ35IS677G6ZHPTL2LSVRHTXANCNFSM4VCBTJGA .

e96031413 commented 3 years ago

After removing '/', I type the command as below:

python3 train_taskonomy.py -d=taskonomy/ -a=xception_taskonomy_new -j 4 -b 64 -lr=.1 --fp16 -sbn --tasks=sdnerac -r

Same error again

tstandley commented 3 years ago

Do you have more than just the rgb folder inside taskonomy/ ?

Can you do a 'ls -l taskonomy' for me?

On Sat, Dec 19, 2020 at 12:12 AM e96031413 notifications@github.com wrote:

After removing '/', I type the command as below:

python3 train_taskonomy.py -d=taskonomy/ -a=xception_taskonomy_new -j 4 -b 64 -lr=.1 --fp16 -sbn --tasks=sdnerac -r

Same error again

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tstandley/taskgrouping/issues/3#issuecomment-748439559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDABXQZLRXBAYFMFQCEUHDSVROARANCNFSM4VCBTJGA .

e96031413 commented 3 years ago

Yes, there are more than one folder.

$ ls -l taskonomy

total 8
drwxrwxrwx 1 root root 763196 Dec 19 12:11 class_object
drwxrwxrwx 1 root root 763196 Dec 19 12:11 class_scene
drwxrwxrwx 1 root root 819980 Dec 19 12:12 depth_euclidean
drwxrwxrwx 1 root root 782124 Dec 19 12:13 depth_zbuffer
drwxrwxrwx 1 root root 801052 Dec 19 12:14 edge_occlusion
drwxrwxrwx 1 root root 763196 Dec 19 12:16 edge_texture
drwxrwxrwx 1 root root 744268 Dec 19 12:17 keypoints2d
drwxrwxrwx 1 root root 744268 Dec 19 12:18 keypoints3d
drwxrwxrwx 1 root root 147184 Dec 19 12:18 nonfixated_matches
drwxrwxrwx 1 root root 649628 Dec 19 12:20 normal
drwxrwxrwx 1 root root 744268 Dec 19 12:20 point_info
drwxrwxrwx 1 root root 895692 Dec 19 12:21 principal_curvature
-rwxrwxrwx 1 root root   6057 Dec 19 12:11 README.md
drwxrwxrwx 1 root root 706412 Dec 19 12:22 reshading
drwxrwxrwx 1 root root     16 Dec 19 13:47 rgb
drwxrwxrwx 1 root root 706412 Dec 19 12:27 rgb_large
drwxrwxrwx 1 root root 819980 Dec 19 12:28 segment_semantic
drwxrwxrwx 1 root root 838908 Dec 19 12:28 segment_unsup25d
drwxrwxrwx 1 root root 819980 Dec 19 12:29 segment_unsup2d
cfifty commented 3 years ago

I received the same ValueError, and @tstandley's suggestion to remove the preceding '/' in -ds='/taskonomy/' fixed the issue for me.

There's an issue with the segment_semantic data from the https://github.com/alexsax/taskonomy-sample-model-1 repo in that the ending is point_x_view_ydomainsegmentsemantic.png differs from the expected point_x_view_ydomainsegment_semantic.png.

For convenience, here's a script to quickly rename the files (replacing cauthron with the directory name you chose -- i.e. taskonomy_data/segment_semantic/cauthron/..).

import os

for count, filename in enumerate(os.listdir("cauthron")):
    src = 'cauthron/' + filename
    prefix = filename[:-19]
    ending = 'segment_semantic.png'
    renamed = 'cauthron/' + prefix + ending
    os.rename(src, renamed)
tstandley commented 3 years ago

@e96031413 Were you ever able to get it working?

By the way, how's your way to download the taskonomy dataset? I only see you wrote Get training data StanfordVL/taskonomy@master/data on README file. So you train with full 12TB dataset?

Actually, I got the data from the authors before it was available. Perhaps the layout was slightly changed the way @cfifty mentioned. In any event, I have pre-processed the data down to 256x256 (both the RGB and the labels). You are going to want to do that too if you want to get good performance. This brings the dataset size down to something like 2.4TB from 12TB. On top of that, you don't need the data for tasks you are not training, so just don't use the other folders, which might save 1TB or so.

I do use all available building models and all of their images though. Dealing with a dataset that large does come with challenges, but I don't think it makes sense to leave performance on the table by not using all of the available data.