thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
7.6k stars 1.1k forks source link

RNN related issues #937

Open MischaPanch opened 10 months ago

MischaPanch commented 10 months ago

There is a bunch of reported RNN problems: #795 #513 #486 and some others. Generally, performance of agents with RNN on top never seemed to be good. This should be looked into, compared with other libraries, and fixed.

@Trinkle23897 @araffin @BFAnas If you have ideas where to start or whom to ping, feel free to write here. I guess it won't be an easy one...

Until this is solved, it might make sense to document that using RNNs is currently discouraged.

Trinkle23897 commented 10 months ago

The reason is in #486

spacegoing commented 10 months ago

just FYI, one simple reason might be training batch are shuffled by default. setting: batch.split(batch_size, shuffle=False, merge_last=True) solves my problem

MischaPanch commented 10 months ago

So overall, including more information about episode time steps and ids, as well as avoid shuffling when applying RNNs (at least, within an episode) should be useful first steps. This issue can only be closed after performance was thoroughly (and automatically) benchmarked.

The first part is related to #881 and #886, should be not hard to fix.

The benchmarking is #935 and will not happen all too soon. We can collect anecdotal evidence about performance here though. I will ping you all once the changes to Batch and shuffling have been implemented

MischaPanch commented 8 months ago

@spacegoing Last time we spoke you mentioned that you wanted to give this a go. What's your status, still interested? :)

spacegoing commented 8 months ago

@MischaPanch Yes. I think I kind of fixed on-policy algo (PPO). It works correctly in my case. I need more bad cases to double-check. AFAK https://github.com/thu-ml/tianshou/issues/486 is the only on-policy related issue?

Once my implementation passed all bad cases I'll open a pull request :D

MischaPanch commented 8 months ago

Great, thanks, happy to hear that! I grouped all rnn related issues under the label RNN, feel free to have a look, if you want a quick overview of reported problems

spacegoing commented 8 months ago

this is super helpful! I'll let u know once I finished checking.

spacegoing commented 8 months ago

@MischaPanch hi bro I don't know where to raise this issue properly: may I suggest we collect T+1 steps in collector for the sake of value / gae calculation , as well as hidden states of rnn?

as more bad cases covered I found current implementation of calculating values \ gaes is kind of adding unnecessary complexity.

is there unreleased branch improving value calculations related mechanisms? many thanks!

MischaPanch commented 8 months ago

Hi @spacegoing , sorry for the late response - the last week has been very hectic.

Sure, if the collection needs to be adjusted for RNNs to work, makes sense to do it. There is no work on our side on this at the moment. Feel free to make a separate PR for this, that can be reviewed and merged quickly. No need for a dedicated issue to track it.

A related issue may be #893

spacegoing commented 8 months ago

@MischaPanch Thanks Michael, that issue is a bug indeed.

The T+1 proposal I raised is also related to https://github.com/thu-ml/tianshou/issues/886, remember the discussion we had? I'll do a quick summary here that:

  1. it isn't necessary recomputing log_prob before learn() instead of re-using forward log_prob
  2. we can also compute value during forward other than in processing_fn
  3. vs is re-computed using batch.obs_next instead of shifted from v_s, it should be shifted from v_s with only the T step computed from batch.obs_next.

On the otherside, my proposal of collecting T+1 steps is related to 3, so that hidden states of RNNs can be easily calculated with policies functions untouched. Otherwise, for each policy, we need to modify their process_fn and compute_returns. And the T+1 step is only used for calculating hidden states / values (gae etc), and hopefully no other stats will be changed.

If there is no current progress and 1. and 2., I'm happy to write an MVP that integrates all 3 points and see if this introduces any discrepancies :D

MischaPanch commented 8 months ago

Sounds good, thank you! Let me know if you need help with that

MischaPanch commented 5 months ago

Hi hi @spacegoing. Are you still interested in solving this? We have sorted out some of the more urgent issues in tianshou and have some capacity for tackling RNN related things.

@carlocagnetta will have a look at #881 , which should be not too hard to solve and be a useful addition for the RNN topic