uzh-rpg / sim2real_drone_racing

A Framework for Zero-Shot Sim2Real Drone Racing
http://rpg.ifi.uzh.ch/research_drone_racing.html
MIT License
91 stars 23 forks source link

Seems like no DAgger strategy be used when collecting the data and training like mentioned in the paper. #8

Open tianqi-wang1996 opened 4 years ago

tianqi-wang1996 commented 4 years ago

In the paper, you first let the expert policy fly 40s to collect data and train the network for 10 epochs on the accumulated data. Then in the next run, you use the trained network to navigate and exploit expert policy to label those situations as augmented data, if the distance from the global trajectory is higher than a margin, then you switch it back to the expert policy. If the network needs less than 50 times help from the expert to complete the track, then you increase the margin by 0.5m.

I have two questions here:

  1. I've followed the instructions posted in this repository to collect data and train the network and I've also looked through some parts of the code, seems like instead of the DAgger strategy, simply the normal data-collecting and training pipeline is used here, in which no augmented data from the partially-trained network execution is collected.
  2. In the DAgger strategy, why do you increase the margin if the trained network is doing well? Shouldn't we instead be stricter and decrease the margin in order to make to trained network performs good enough eventually?