Open tensor-works opened 1 year ago
After looking at this a bit longer I could identfy the issue lies with the setting batch_mode=complete_episodes. In this setting, for some runs, depending on the batch size (I did not find out the exact relation here), the learner never receives the episode finish signal. I do not know if this is intended behaviour. I apologize for my poor english grammar.
Sorry for deprioritizing this issue! But we are very close to moving A3C (and some other algos) into a new "RLlib contrib" repo, so support for this algorithm will be very limited.
Hi @sven1977 when I install ray
pip install -U "ray[all]"==2.9.0
I don't see the "rllib-contrib" to be part of the installation
What happened + What you expected to happen
I am currently running trials with the A3C algorithm in an episodic environment. Since the horizon option in the 2.3.0 build has been removed, it periodically occurs, that environments do not finish. This is only occuring with A3C and not with any other alogrithm, even actor-critic methods such as PPO do not show behaviour of this type. The result are lost trials which cannot be evaluated and a waisting of compuational resources.
This is an example of an eroneous run:
This is my step function:
This is my reset function:
Please not, that I emualte the functioning of the gymnasiums TimeLimit wrapper, which is advertised as solution for this issue. However I am using a multiagent env, while the TimeLimit wrapper is for single agent envs only. I am still not fully confident of not confident that this is maybe a mistake I am making myself.
This is a parameter.json from an erroneous run:
Versions / Dependencies