huawei-noah / SMARTS

Scalable Multi-Agent RL Training School for Autonomous Driving
MIT License
909 stars 184 forks source link

Pytest cannot be run without `forked` #743

Open Gamenot opened 3 years ago

Gamenot commented 3 years ago

BUG REPORT

High Level Description By the problems that ULTRA team were having, pytest has stability issues running tests without forked. Tests with forked take a much longer time to run. This is due to issues with having multiple SMARTS instances.

SMARTS version SMARTS branch: ultra-gb-record-density-data 2611ccc21f95ee6791d5efe443f7a48f388f6489

Previous associated issues https://github.com/huawei-noah/SMARTS/issues/719 https://github.com/huawei-noah/SMARTS/issues/597 https://github.com/huawei-noah/SMARTS/issues/184

Steps to reproduce the bug Run the CI on the 2611ccc21f95ee6791d5efe443f7a48f388f6489 commit Use branch listed here: https://github.com/huawei-noah/SMARTS/pull/745

Resulting and expected behaviour :display:x11display(fatal) thrown when all tests should pass.

Error logs and screenshots https://github.com/huawei-noah/SMARTS/pull/730/ https://github.com/huawei-noah/SMARTS/runs/2279454801

System information

Impact [If known] CI is much slower due to having to run forked.

sah-huawei commented 3 years ago

With PR #747, I was just able to run make test without --forked on my local laptop (although I had to hack -n 3 to prevent overload!). All tests passed.

@Gamenot do you think we should remove --forked from the test rule in the Makefile? Or just do that for CI? (The -n issue is still annoying when running locally.)

EDIT: Never mind. I get seemingly-random failures if I play with -n or test ordering. I guess there's still more digging to do!

sah-huawei commented 3 years ago

I suspect the problems may arise when running two tests that use Ray side-by-side. For example, (after reordering the tests), I got a segmentation fault (in test_multi_instance_example()) when test_rllib_hiway_env() (from ./env/tests/test_rllib_hiway_env.py) was being executed at the same time as test_multi_instance_example() (from ./tests/tests_examples.py), both of which have calls to ray.init().