thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
7.83k stars 1.12k forks source link

Implementation design issues in SubprocVectorEnv #573

Open duburcqa opened 2 years ago

duburcqa commented 2 years ago

Currently, in SubprocVectorEnv, there is a single send method to do both reset and step. So, it relies on the action being None to call reset instead of step. Thus is a bad design choice because there is no reason to forbid the actual action to be None. In particular, in Jiminy simulator, it means re-using the previous action for the current step. Not to mention that send is not clear at all, which is already a good reason to rename it.

Secondly, the list of methods that can be called by subprocesses is hard-coded here, which excessively restricted for no actual reason. In my case, my environments have many other methods apart from the standard one, and it should be possible to extend SubprocVectorEnv to add extra methods, which is not the case because of this. I suggest modifying the else clause:

else:
    if not hasattr(env, cmd):
        raise NotImplementedError
    p.send(getattr(env, cmd)(data))

It should be possible to be generic enough to get rid of "reset", "render" and "seed", and "getattr" elif clauses.

PS: I'm planning my comeback in the next few months, hopefully for long :smiley:

Trinkle23897 commented 2 years ago

Hello my old friend! Nice to hear from that great news!

I change this to support EnvPool async mode little by little. in EnvPool, both the step and reset are split into two folds:

and actually there's no need to treat reset and step differently. If we apply an auto-reset environment wrapper, and treat "step(action) while done=True" as reset, all operations can be vectorized and achieve the best performance. This approach also requires to refactor the collector's code, but after this refactor, collector will be much cleaner.

duburcqa commented 2 years ago

I see what you mean, and I mostly agree, but in my view what you describe should be a higher level API than what SubprocVectorEnv is supposed to be. SubprocVectorEnv is the lowest-level of control you can possibly have and as such all restriction must be very well motivated, which is not the case here. SubprocVectorEnv is not limited to learning. It could be used for testing, video recording...etc In this context, it makes sense to be able to reset even if done is false, and to continue step even though done has been true once in the past. It is not part of OpenAI Gym specification to have to reset the env once done is receive, and it is actually a case that is handle in the official repo as far as I remember for the toy envs. It just throws a warning to make sure the user is aware of it but it can still continue stepping. It can be very useful to debug and analyze complex env.

araffin commented 2 years ago

It should be possible to be generic enough to get rid of "reset", "render" and "seed", and "getattr" elif clauses.

You are probably looking for env_method as done in SB3 ;) https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/subproc_vec_env.py#L48 (together with set_attr and get_attr)

MischaPanch commented 11 months ago

What's the status here, has the issue been fully addressed by now?