ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.51k stars 5.69k forks source link

[RLlib] Add syntax checking to configuration string literals or migrate to enums. #39384

Open tbukic opened 1 year ago

tbukic commented 1 year ago

Description

Ray is very configurable framework and has a lot of configuration options which are string literals. Those have to be manually typed by library users, risking subtle bugs without error messages. Checking if configuration literal is allowed, or making it an Enum and delegating checking from the execution to the static code analysis will help eliminate those bugs and improve the development experience.

Use case

Example: Value of evaluation_duration_unit can be episodes or timesteps. Despite episodes is mentioned as default value in the documentation, if one makes a non-obvious typo episode - library doesn't warn the user for error and it acts with non-default option, as if timesteps was given.

sven1977 commented 1 year ago

Hey @tbukic , thanks for raising this issue! :) As of Ray 2.0, you should use RLlib's AlgorithmConfig classes, which do all sorts of type checking on the config attributes as well as the configured values early on, but also deeper within the code. Here is a link to the docs where the usage of these configs is explained in detail: https://docs.ray.io/en/latest/rllib/rllib-training.html#configuring-rllib-algorithms

tbukic commented 1 year ago

Hey @sven1977 , thank you for the fast response.

Just a small clarification regarding AlgorithmConfig. Here is the example of code which has created the problem (btw. algorithm was PPO):

config: AlgorithmConfig = get_trainable_cls(cfg.algorithm).get_default_config()
...
config.evaluation( 
        evaluation_interval=1,
        evaluation_duration=1,
        evaluation_duration_unit='episode',
 )

While the error in this particular example might have leaked through some AlgorithmConfig's checks, I was pretty confident that the my bug was hid by Algorithm's constructions ( example: algorithm.py#L1295-L1303 as linked above, but this is not the only instance) like:

if unit == "episodes":
    ...
else:
    ... # assumes: unit == "timesteps"
sven1977 commented 1 year ago

You are right, there is no good pre-check right now, even in AlgorithmConfig for some settings. Yeah, these probably fell through the cracks. We'll have to go through the class again and make sure that e.g. these binary settings (one string or another) are checked properly (or yes, solved with enums).