There seems to be a detach() in location_network() while obtaining mu from h_t. Same thing for the baseline or value estimation. Is this required? If yes, then essentially, the log_prob loss is not training the RNN, but only the fc layer for mu computation.
Is this correct?
There seems to be a detach() in location_network() while obtaining mu from h_t. Same thing for the baseline or value estimation. Is this required? If yes, then essentially, the log_prob loss is not training the RNN, but only the fc layer for mu computation. Is this correct?