Open MischaPanch opened 10 months ago
The reason is in #486
just FYI, one simple reason might be training batch are shuffled by default. setting:
batch.split(batch_size, shuffle=False, merge_last=True)
solves my problem
So overall, including more information about episode time steps and ids, as well as avoid shuffling when applying RNNs (at least, within an episode) should be useful first steps. This issue can only be closed after performance was thoroughly (and automatically) benchmarked.
The first part is related to #881 and #886, should be not hard to fix.
The benchmarking is #935 and will not happen all too soon. We can collect anecdotal evidence about performance here though. I will ping you all once the changes to Batch and shuffling have been implemented
@spacegoing Last time we spoke you mentioned that you wanted to give this a go. What's your status, still interested? :)
@MischaPanch Yes. I think I kind of fixed on-policy algo (PPO). It works correctly in my case. I need more bad cases to double-check. AFAK https://github.com/thu-ml/tianshou/issues/486 is the only on-policy related issue?
Once my implementation passed all bad cases I'll open a pull request :D
Great, thanks, happy to hear that! I grouped all rnn related issues under the label RNN, feel free to have a look, if you want a quick overview of reported problems
this is super helpful! I'll let u know once I finished checking.
@MischaPanch hi bro I don't know where to raise this issue properly: may I suggest we collect T+1 steps in collector for the sake of value / gae calculation , as well as hidden states of rnn?
as more bad cases covered I found current implementation of calculating values \ gaes is kind of adding unnecessary complexity.
is there unreleased branch improving value calculations related mechanisms? many thanks!
Hi @spacegoing , sorry for the late response - the last week has been very hectic.
Sure, if the collection needs to be adjusted for RNNs to work, makes sense to do it. There is no work on our side on this at the moment. Feel free to make a separate PR for this, that can be reviewed and merged quickly. No need for a dedicated issue to track it.
A related issue may be #893
@MischaPanch Thanks Michael, that issue is a bug indeed.
The T+1 proposal I raised is also related to https://github.com/thu-ml/tianshou/issues/886, remember the discussion we had? I'll do a quick summary here that:
On the otherside, my proposal of collecting T+1 steps is related to 3, so that hidden states of RNNs can be easily calculated with policies functions untouched. Otherwise, for each policy, we need to modify their process_fn
and compute_returns
. And the T+1 step is only used for calculating hidden states / values (gae etc), and hopefully no other stats will be changed.
If there is no current progress and 1. and 2., I'm happy to write an MVP that integrates all 3 points and see if this introduces any discrepancies :D
Sounds good, thank you! Let me know if you need help with that
Hi hi @spacegoing. Are you still interested in solving this? We have sorted out some of the more urgent issues in tianshou and have some capacity for tackling RNN related things.
@carlocagnetta will have a look at #881 , which should be not too hard to solve and be a useful addition for the RNN topic
There is a bunch of reported RNN problems: #795 #513 #486 and some others. Generally, performance of agents with RNN on top never seemed to be good. This should be looked into, compared with other libraries, and fixed.
@Trinkle23897 @araffin @BFAnas If you have ideas where to start or whom to ping, feel free to write here. I guess it won't be an easy one...
Until this is solved, it might make sense to document that using RNNs is currently discouraged.