Non-deterministic end-to-end test failures.

Describe the bug The end to end tests for training and evaluation occasionally fail or timeout, especially when running on Github actions. It's difficult to reproduce this behavior locally. The failures seem to occur most on tests which use async processing + the buffer. This leads me to believe that there is a concurrency control bug (e.g. deadlock) occurring.

The workaround for this bug is to just re-run the tests.

To Reproduce Occasionally can reproduce when running GitHub Actions workflow. E.g. https://github.com/marius-team/marius/actions/runs/3056399004/jobs/4930521831

I have not observed async processing bugs when running on large-scale datasets, only on the tiny-scale datasets used for testing.

The main challenge will be isolating and identifying the issue. My approach will be to run a highly asynchronous configuration on a small dataset, which will hopefully recreate the conditions needed for the concurrency bug to arise.

Environment Occurs on both Linux and MacOS

marius-team / marius

Non-deterministic end-to-end test failures. #113