Closed ninehusky closed 2 years ago
@ninehusky can you see what happens when you run the tests on a single thread? See: https://doc.rust-lang.org/book/ch11-02-running-tests.html#running-tests-in-parallel-or-consecutively
I suspect what is happening is this: cargo test runs tests in parallel. Multiple tests which use TVM get started at the same time. When TVM gets used for the first time by these tests, it does some kind of initialization in which it initializes the /root/.tvm/tophub directory. So when multiple tests trigger this initialization in parallel, there's a race condition to see which thread creates the directory first.
If that's the case, we'll probably need to find a way to trigger that setup before running the tests.
Oh, lol, this has already been fixed: https://github.com/apache/tvm/commit/bf20107ffe6e96e20125a2209500668777095337
I was looking in the tophub.py file from which the error is triggered. It seemed like the error had been anticipated and fixed, though, so I checked the git blame and found the above PR, in which someone fixed the issue.
So to fix this issue we should just need to update TVM. This may be an easy fix; I'll give it a go right now.
On a clean copy of the repository, running the commands above works on the first iteration.
However, subsequent runs of the test suite produce the following output:
Sometimes, clearing the Docker cache and rebuilding the image can fix this issue, but it doesn't always fix it for some reason.
We should look into this!