ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.37k stars 5.85k forks source link

[Bug] Neural MMO Tests #21088

Open jsuarez5341 opened 2 years ago

jsuarez5341 commented 2 years ago

Search before asking

Ray Component

RLlib

What happened + What you expected to happen

I have been maintaining a big fancy multiagent simulator dependent on RLlib for the past few years. Every time a new ray version comes out, several (sometimes dozens) of new bugs break basic functionality. It is impossible for me to submit repro scripts for each individual issue because:

  1. The bug occurs in the context of Neural MMO, a platform using most of Ray Tune and RLlib's features concurrently
  2. These bugs always throw incomprehensible internal RLlib errors with unhelpful traces
  3. I have about a 25% success rate of producing a repro script even spending an entire day on a single bug
  4. Ray development moves too fast for me to reasonably test each nightly/minor version, so typically I can only narrow down the point of introduction to an entire version release (sometimes multiple)

I am unaware of any other projects like Neural MMO that ferret out as many bugs. I also have a vested interest in RLlib working with my platform.

Here's my proposal: Add Neural MMO smoke tests to RLlib. Just checking whether training for a couple of epochs crashes will catch tons of bugs.

To give you an idea of how much this will improve RLlib: multi-GPU, simple-optimizer=False, APPO, Impala, evaluation worker .foreach methods, render worker instantiation, and raylet termination are all bugged in master. I hope to have the opportunity to help with all of these, but I am unable to do so through repro scripts on dummy environments

Versions / Dependencies

v1.5.2-master

Reproduction script

As per above, the point of this post is to establish new tests

Anything else

No response

Are you willing to submit a PR?

sven1977 commented 2 years ago

Hey @jsuarez5341 , this really is a great idea! I'm in full support of this effort and would love to help with the PR. One question I have is: Should we start by writing a quick (randomized) env emulator using our RandomMultiAgentEnv? Or would that not cover most of Neural MMO's capacity?

I have this abandoned PR here, which has a new example script mocking a multi-agent env with many dynamically added/removed agents per episode. Could you take a quick look and let me know whether this is useful?

jsuarez5341 commented 2 years ago

@sven1977 Yes, this is useful -- my suggestion would be to do both. This would give you a good idea of the gap between your unit tests, your in-house integration tests (this mockup MMO), and actual research/applications on the platform.

Here are the core things NMMO uses off the top of my head. Mind you, this is not an exhaustive list, and there's a strong possibility that replicating just these outside of a real application will result in a significant coverage gap:

How's this sound for testing: