Some question about "Data Generation" in GitHub

opendilab / InterFuser

[CoRL 2022] InterFuser: Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer

Apache License 2.0

514 stars 42 forks source link

Some question about "Data Generation" in GitHub #81

Open JunyongYun-SPA opened 8 months ago

JunyongYun-SPA commented 8 months ago

I have a few questions and I'm posting them.

Do I have to generate data in advance through Github's Data Generation process in order to train the model? Or, is it possible to learn the model right away without the Data Generation process?
If I should do a Data generation process, is there a way to create it more efficiently as it seems to take too long to generate for all towns and all weathers with Data generation?
In town05 long benchmark, the test set is town05 long, and exactly what data is used for the train set? As far as I know, there are about 576 (2183) data, including 21 weathers, 8 towns, long, short, and tiny respectively.

Thank you.

deepcs233 commented 8 months ago

Hi!

Yes, you need to collect the dataset first if you want to train the model.
You could start multiple carla servers at the same time, and run batch scripts we provided. It may cost up to two weeks.
We use the data from the other towns as the train set.

soyaCat commented 8 months ago

Thank you for your response. I am the co-worker of the person who asked the first question. I understood the answers to the first and second questions, but I still have some doubts regarding the third question. In the town05 benchmark, I'm curious the 'data from other towns as the train set' means datasets generated under all weather conditions of long, short, and tiny for each town, which amounts to 21 different weather conditions.

deepcs233 commented 8 months ago

Hi! We didn't distinguish the training data from different weathers because Town05 benchmark doesn't have this requirement. Besides, "long, short, and tiny" only denote the route length and they share the same town map and traffic scenarios. Our framework takes as input the single frame data. So we also didn't distinguish the data from "long, short, and tiny".

During the data collection, we didn't put the collected data into 576 data folders. We just collected and name them like "Town05_long_weather13_22_15_14". Then we can choose the data folders we need according to their names when training. "576 (2183) data" is not the total size of our dataset. Each type (short or long) may have 5-200 routes. For example, Short Route doesn't mean a specific route, and it means a type of routes. You can refer to https://github.com/opendilab/InterFuser/blob/main/leaderboard/data/training_routes/routes_town01_short.xml

JunyongYun-SPA commented 7 months ago

Thank you for your quick response.

However, there are still some questions.

You said short, long, and tiny only represent the length of the route, so shouldn't town01_long and town01_short include the same number of routes? But town01_long has 10 routes and short has 22 routes. I understand that the route of long and short are different routes, is that correct?
You said you don't distinguish training data from other weathers, then doesn't the training data include all 21 weathers? If so, by what criteria did you choose the weather for each town?
I'm sorry there was a typo in my question. The 576 (2183) I'm talking about is 576 (21x8x3). In other words, if we collect data for all towns (8), taking into account all-weather (21) and all types (long, short, and tiny), we predict that approximately 576 (21x8x3) folders will be created. But according to you, it's wrong, right?

Thank you.

deepcs233 commented 7 months ago

Hi!

You can check this folder which may answer your question: https://github.com/opendilab/InterFuser/blob/main/leaderboard/data/training_routes/
To evaluate in Town05 benchmark, weather conditions are not restricted. So we use all weathers. If you run some other benchmarks, you may need to filter some weather conditions for training.
Yes, because each type of route (tiny/short/long) all have multiple routes and create thousands of folders instead.

No4x commented 7 months ago

Hi!

Yes, you need to collect the dataset first if you want to train the model.

You could start multiple carla servers at the same time, and run batch scripts we provided. It may cost up to two weeks.

We use the data from the other towns as the train set.

Hello, I'm also very interested in the results of data generation. Because I integrated some other sensors, some routes may perform worse than ideal. Maybe 2-30% of the routes failed in a town. Is this normal?

deepcs233 commented 7 months ago

That's ok. What's important is to make sure to collect enough data within safe controls. The frames in the failed case can be dropped to improve the data quality if you need to.

No4x commented 7 months ago

That's ok. What's important is to make sure to collect enough data within safe controls. The frames in the failed case can be dropped to improve the data quality if you need to.

Thanks.