onlytailei / carla_cil_pytorch

A pytorch implementation to train the conditional imitation learning policy in "Visual-based Autonomous Driving Deployment from a Stochastic and Uncertainty-aware Perspective".
BSD 2-Clause "Simplified" License
65 stars 10 forks source link

how to split dataset? #8

Open sparshgarg23 opened 4 years ago

sparshgarg23 commented 4 years ago

Hi,I saw the paper mentioned in the reference link.For that 3 weather conditions are used for training and one is for evaluation right. However a closer inspection of the dataset reveals that it has 2 folders TrainSeq : 3289 .h5 files TrainVal: 374 .h5 files So how do we know which of these .h5 files belong to the 4 weather categories mentioned in "Supplement file of VR-Goggles for Robots:Real-to-sim Domain Adaptation for Visual Control" I also noticed that when I add up the numbers mentioned in that paper : for train the number of sequences comes out to be 3224 for training and for test the number stands at 375.But train and val sequences in the dataset are 3289 and 374. Would appreciate it if you could give guidance on how to design the dataset properly. Thirdly, when I was running unit_test.py I ran into this error OSError: Unable to open file (File signature not found) On further research some people mention that it's because .h5 files are not being closed,but we are using a context manager in carla_loader.py so that issue should not arise. So in order to work around that issue,I changed the value of i=60 to i=10 ,and it works then.Unfortunately,I can't reproduce the error as it causes my system to hang.Do you thing that somehow my system configuration is not able to handle the processing.I have a ubuntu 18 CPU.If we ran this on a GPU would it make a difference.

onlytailei commented 4 years ago

Actually we set it by ourselves. To extact the first image of each .h5 file, it is easily to be categoried. And files under the same weather conditions are almost clustered by themselves, so it is not a quite hard work.

Thank you for point out that the count is smaller than the sum of carla dataset. I just checked the training data we used. They are as the same number as we showd in the supplement file. I guess that part of the original data are overlooked when we separated them by weather conditions. Considering only 2% are missing, I think it will not influence the policy training too much. But still I feel sorry for this mistake.

For the data loading error, if you can provide a docker environment to reproduce it, maybe I can help to check it. Otherwise, it is not that easy to know what happened.