Tips for optimizing model in new evironment?

NextSim commented 6 years ago

First of all, thank you for making these tutorials! They've been incredibly helpful for getting started with autonomous driving. My question is, do you have any tips for collecting useful data in new environments, and how to improve/optimize the model's performance?

Problem details

I've worked through the Autonomous Driving using End-to-End Deep Learning: an AirSim tutorial, and am now trying apply the concepts discussed to train a model to drive in the AirSim Neighborhood environment. I've already collected training data (using a steering wheel), and, using the same network architecture and parameter values as the tutorial, I've been able to train a model where the car drives around for a few minutes before swerving off the road or crashing into a parked car, etc.

I have begun to try modifying different parameters and variables to optimize the model's performance, such as following the suggestions in the tutorials: modifying the region of interest, zero drop percentage, network architecture, etc. However, I've run into a few problems.

My main problem is that a trained model with lower validation loss doesn't necessarily correspond to a model with better driving performance. For example, one model I trained had a validation loss of .0002429 and crashed while making its first turn, but a different model with a val loss of .0005796 was able to drive pretty well for about five minutes. As a general trend, I've found a lower val loss does indicate a better performing model, but not enough that I can rely on it to optimize performance. This has costs me quite a bit of time, as each time I make changes and retrain, I have to manually test the model and watch the car drive, instead of just being able to rely on minimizing the val loss.

My best guess as to why this is happening is poor training data. I understand that if the training data is "bad", then no matter how low you are able to get your val loss, your model will perform poorly, as it has learned to match the "bad" training data. I did my best to follow the ideas outlined in the Data Exploration and Preparation jupyter notebook when I collected my training data. The majority of the data was collected by driving normally. Then I also collected data to take care of edge cases and deviations from the ideal (like the swerve data does in the tutorial), but I have no idea how "good" the data I collected is, since we're using an end-to-end model where most everything is abstracted, so I don't know if this is actually a problem with my data or with something else.

At this point, I've just been iteratively modifying and testing the model manually. I know there's got to be a better way to approach this, but I'm at a loss for what to try. It's difficult to optimize performance when it doesn't directly correspond to validation loss or another quantitative value (as far as I can tell). I would greatly appreciate any tips anyone can offer for collecting better training data or improving model performance. Thank you!

adshar commented 6 years ago

Hi @NextSim

There a multiple things you could try here. Please remember that the neighborhood environment is much more complicated than the landscape environment as there are things like parked cars and intersections in it which were not something you had to deal with in the landscape environment. This means that:

You will have to collect more data overall to ensure you have enough training samples to account for the different visuals the car will see.
You will need to collect data for some specific use cases like those 90 degree turns, avoiding hitting cars etc. You will also need to make sure that your dataset is balanced.
Since there is a lot more going on in the image frames compared to the landscape environment, you will probably need a better network architecture.
Lastly, it might be possible that training your network from scratch just requires too much data in this environment. You might want to look into transfer learning. We introduced the concept in the Distributed Deep Reinforcement Learning for Autonomous Driving tutorial. This is also a great example of the limitations of a supervised approach to training in the autonomous driving context. Reinforcement learning lets you potentially collect an infinite amount of data without the need for a human in the loop. As you will see in the tutorial, the car learns to perfectly drive around the neighborhood environment after it learns using reinforcement.

mitchellspryn commented 6 years ago

@adshar gave some good advice, but there are a few points I'd like to add regarding the current model, and things you may be able to try to improve it.

1) You aren't learning the metric that you are trying to optimize. Your model is optimizing MSE (or something similar) between predicted and actual steering angle. It sounds like you are measuring success by "time on the road." These are related, but different objectives. This is why, as you've observed, you can have a model with lower loss perform worse because the loss you've optimized for is not the same metric that you are measuring. This is why it's critical to strictly define the metrics that you are using for measuring model performance.

Unfortunately, if you are trying to optimize "time on the road," there isn't a really good way to do this without running the car for each model. This makes offline learning difficult - thus an online learning (like reinforcement learning) may be better suited.

2) Getting a representative dataset for the problem is challenging. This is distinctly different from getting "enough" data. In addition to overall quantities, the distribution of steering angles contained within your dataset should match the distribution seen on an average dataset. For example, if a steering angle of 0 appears 60% of the time in average driving, having a dataset with zero steering angles will appear 50% of the time is a problem. You may be able to model the data perfectly (thus, having a low loss), but you've learned the wrong distribution, so the model will perform poorly, regardless of how much data you have.

To fix this, try driving around normally for a long time and recording the distribution of labels that you see. Then, sample your dataset during training to ensure that the underlying distributions match.

3) Your loss function doesn't take into account that some mistakes are worse than others. Consider two scenarios:

A: You are off by 3 degrees to the right in the middle of the road.
B: You are off by 3 degrees to the right on the side of the road.

Both are obviously bad. But, (A) is recoverable (by subsequently taking a hard left), and (B) is not (you crash instantly). To the loss function, both of these scenarios contribute the same loss. But in reality, the model should be penalized more for making error (B) than error (A), as the consequences are worse.

To fix this, you can consider adding a hidden variable related to the location of the car in the road to the loss function. Thus, penalizing the car more for steering the wrong way when it is far away from the center of the road. This should not be an input feature to the model (as there isn't a great way of getting this data in real time).

4) Your input is ambiguous. The landscape environment has only a single road. From a single image, there is a single, unambiguously correct action. Neighborhood has intersections, of which there can be multiple correct actions (i.e. "turn left" and "turn right"). If half of your labels say "turn left" and the other half say "turn right," the model will average them and come up with "go straight," which is probably undesirable.

Accounting for this is tricky. One thing you could try - for those images where turning left and turning right are both acceptable (i.e. the "beginning" of a turn), you can modify your loss function to operate on absolute values - that is, instead of (y - y_pred)^2, use (abs(y) - abs(y_pred))^2. But you'll need to be careful that you don't take this too far - once you've gone halfway through a turn, aborting and turning the other way is a bad idea.

NextSim commented 6 years ago

Okay, I think I understand the problem a lot better now. I'll check out the Reinforcement Learning tutorial and try your suggestions. Thank you!!

NextSim commented 6 years ago

Hi,

We've been running training with reinforced learning.. it ran over the weekend, but the driving is still very poor. It frequently can't make a turn, or it randomly turns into the side of the road.

@adshar "As you will see in the tutorial, the car learns to perfectly drive around the neighborhood environment after it learns using reinforcement."

If this is learning locally, how long should I expect this to take? We can't get it anywhere near driving perfectly around the neighborhood. Also, should it be able to drive around the neighborhood for several minutes once trained?

Sometimes it looks like the algorithm thinks that driveways are roads and tries to turn down them... have you seen this?

For now we've left the hyperparameters (batch_update_frequency, max_epoch_runtime_sec, etc) as the default for now.

But we're not really seeing it converge. Sometimes the car immediately makes a sharp left or right turn on a straight street.

We've thought about modifying the reward function to see if that helps, but we're struggling to think of something that would be helpful that isn't extremely complex.

mitchellspryn commented 6 years ago

Are you starting from scratch, using no pretraining? Then it can take > 1 week to train fully.

During training, the epsilon for the linear epsilon annealing parameter never reaches zero (I think it goes to 0.1 at the lowest), so the model will still be making some choices completely at random. Ensure that you are actually using the test code to test the model rather than the training code if you are trying to get a sense of how the model performs.

NextSim commented 6 years ago

We have tried both methods (starting from scratch and pretraining) with about the same results. I will keep letting it run.

I double checked our training parameters because your comment about the random actions stuck out to me... I had noticed that very quickly the "Percent random actions" was 0. Someone had changed the min value to 0.003. I have reset it to 0.1 and have restarted training.

Thanks!

microsoft / AutonomousDrivingCookbook

Tips for optimizing model in new evironment? #57

Problem details