In the paper, you first let the expert policy fly 40s to collect data and train the network for 10 epochs on the accumulated data. Then in the next run, you use the trained network to navigate and exploit expert policy to label those situations as augmented data, if the distance from the global trajectory is higher than a margin, then you switch it back to the expert policy. If the network needs less than 50 times help from the expert to complete the track, then you increase the margin by 0.5m.
I have two questions here:
I've followed the instructions posted in this repository to collect data and train the network and I've also looked through some parts of the code, seems like instead of the DAgger strategy, simply the normal data-collecting and training pipeline is used here, in which no augmented data from the partially-trained network execution is collected.
In the DAgger strategy, why do you increase the margin if the trained network is doing well? Shouldn't we instead be stricter and decrease the margin in order to make to trained network performs good enough eventually?
In the paper, you first let the expert policy fly 40s to collect data and train the network for 10 epochs on the accumulated data. Then in the next run, you use the trained network to navigate and exploit expert policy to label those situations as augmented data, if the distance from the global trajectory is higher than a margin, then you switch it back to the expert policy. If the network needs less than 50 times help from the expert to complete the track, then you increase the margin by 0.5m.
I have two questions here: