[Bug] Neural MMO Tests - Githubissues

jsuarez5341 commented 2 years ago

Search before asking

[X] I searched the issues and found no similar issues.

Ray Component

RLlib

What happened + What you expected to happen

I have been maintaining a big fancy multiagent simulator dependent on RLlib for the past few years. Every time a new ray version comes out, several (sometimes dozens) of new bugs break basic functionality. It is impossible for me to submit repro scripts for each individual issue because:

The bug occurs in the context of Neural MMO, a platform using most of Ray Tune and RLlib's features concurrently
These bugs always throw incomprehensible internal RLlib errors with unhelpful traces
I have about a 25% success rate of producing a repro script even spending an entire day on a single bug
Ray development moves too fast for me to reasonably test each nightly/minor version, so typically I can only narrow down the point of introduction to an entire version release (sometimes multiple)

I am unaware of any other projects like Neural MMO that ferret out as many bugs. I also have a vested interest in RLlib working with my platform.

Here's my proposal: Add Neural MMO smoke tests to RLlib. Just checking whether training for a couple of epochs crashes will catch tons of bugs.

To give you an idea of how much this will improve RLlib: multi-GPU, simple-optimizer=False, APPO, Impala, evaluation worker .foreach methods, render worker instantiation, and raylet termination are all bugged in master. I hope to have the opportunity to help with all of these, but I am unable to do so through repro scripts on dummy environments

Versions / Dependencies

v1.5.2-master

Reproduction script

As per above, the point of this post is to establish new tests

Anything else

No response

Are you willing to submit a PR?

[X] Yes I am willing to submit a PR!

sven1977 commented 2 years ago

Hey @jsuarez5341 , this really is a great idea! I'm in full support of this effort and would love to help with the PR. One question I have is: Should we start by writing a quick (randomized) env emulator using our RandomMultiAgentEnv? Or would that not cover most of Neural MMO's capacity?

I have this abandoned PR here, which has a new example script mocking a multi-agent env with many dynamically added/removed agents per episode. Could you take a quick look and let me know whether this is useful?

jsuarez5341 commented 2 years ago

@sven1977 Yes, this is useful -- my suggestion would be to do both. This would give you a good idea of the gap between your unit tests, your in-house integration tests (this mockup MMO), and actual research/applications on the platform.

Here are the core things NMMO uses off the top of my head. Mind you, this is not an exhaustive list, and there's a strong possibility that replicating just these outside of a real application will result in a significant coverage gap:

Very long time horizons
Large, variable agent populations
Agents added/removed dynamically within the episode
Structured, hierarchical observation space and action space
Custom PyTorch policies + recurrence
Ray Tune experiment runner
Parallel evaluation during training on a different config
Custom logging callbacks
WanDB integration
Custom render method that starts an async WebSocket server
PPO with GPU training

How's this sound for testing:

I get an updated NMMO pip package out
RLlib side tests will have to clone just the trainer/model setup code
RLlib side smoke tests several algorithms (PPO, APPO, Impala, maybe a couple others). This will require minimal code since the NMMO trainer/model repo will provide all the configs/setup
This includes memory regression + GPU and multi-GPU tests

ray-project / ray

[Bug] Neural MMO Tests #21088