Pendulumv0 vs Pendulum-v0

EvgeniiaVak commented 2 years ago

I'm reading the book here https://learning.oreilly.com/library/view/learning-ray/9781098117214/ch01.html#section_data_processing and I was trying to run rllib with the provided pendulum.yml and got an error containing this line:

gym.error.Error: Attempted to look up malformed environment ID: b'Pendulumv0'. (Currently all IDs must be of the form ^(?:[\w:-]+\/)?([\w:.-]+)-v(\d+)$.)

The text mentions Pendulum-v0, but the yaml example says it's Pendulumv0, judging by the error Pendulum-v0 is correct.

maxpumperla commented 2 years ago

@EvgeniiaVak yeah, thanks so much for reporting this. Not entirely sure how this typo creeped in, but you're correct about that. I'm fixing this as we speak. It should always be Pendulum-v0, of course.

If you have more issues like that, please feel free to report! Very much appreciated.

EvgeniiaVak commented 2 years ago

and -v0 is probably also not right, because got an error with this one too...

gym.error.DeprecatedEnv: Env Pendulum-v0 not found (valid versions include ['Pendulum-v1'])

so with Pendulum-v1 it worked.

It's probably impossible to keep the correct version in a book, but I think it would be super useful to have a tip on how to see available up-to-date environments. The error logs have a lot of misleading info, for example, asking to install gym[atari].

maxpumperla commented 2 years ago

well, this one is a bit difficult. My reasoning was to use an old version "on purpose" so that there would be no questions about cutting edge versions. Esp. for the classical envs there's usually not that much of a problem there.

Let me see what I can do.

EvgeniiaVak commented 2 years ago

ok, with the -v1 the rllib train -f pendulum.yml ran for a few hours, but didn't get episode_len_mean to be 800

== Status ==
Current time: 2022-02-09 00:34:59 (running for 04:29:38.19)
Memory usage on this node: 13.4/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/6 CPUs, 0/1 GPUs, 0.0/12.98 GiB heap, 0.0/6.49 GiB objects
Result logdir: C:\Users\Ev\ray_results\pendulumppo
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+-----------------+--------+------------------+----------+----------+----------------------+----------------------+--------------------+
| Trial name                  | status   | loc             |   iter |   total time (s) |       ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|-----------------------------+----------+-----------------+--------+------------------+----------+----------+----------------------+----------------------+--------------------|
| PPO_Pendulum-v1_c52a5_00000 | RUNNING  | 127.0.0.1:18364 |   8811 |          15784.1 | 35244000 | -151.776 |            -0.375452 |             -371.778 |                200 |
+-----------------------------+----------+-----------------+--------+------------------+----------+----------+----------------------+----------------------+--------------------+

I guess sticking to vetted versions makes sense.

Maybe installing older versions of gym (like here) would solve it? 🤔 If so what version would it be?

Here is requirements.txt which I collected to this point in the book as pip installs are mentioned.

QuackDoctor commented 2 years ago

In case this helps, I was able to run this example after I noticed that the episode_reward_mean was not consistent between the yml file and the one stated below the example yml on page 20 of the pdf book. I set the value to -800 instead of the default +800 in the yml file and it ran in a few minutes. episode_reward_mean: -800 # <4>

maxpumperla commented 1 year ago

apologies about the in-progress errors with the draft, we're slowly wrapping things up and this example is not in the book anymore. let's me know if you have other concerns - thanks!

maxpumperla / learning_ray

Pendulumv0 vs Pendulum-v0 #1