Closed MilesCranmer closed 7 years ago
Here is another Travis-ci issue with similar occurrence/non-reproducibility:
FAIL: test_data_sizes (test_block.TestFFTBlock)
Test that different number of bits give correct throughput size
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_block.py", line 286, in test_data_sizes
self.assertEqual(number_fftd, number_copied)
AssertionError: 163072 != 113920
I should note that I can't reproduce this locally with CUDA-enabled Bifrost. I have not tried yet with CUDA disabled. It might be that this issue is created somehow when you set the NOCUDA flag.
I should also note that the number_copied
in the second Travis issue changes from run to run (of the times the issue occurs)
Update: I ran the test suite twice on a local CPU-only Bifrost docker container, and all (CPU) tests passed. I do not know why Travis is having difficulty with this.
Apparently there is a way to run a Travis instance locally: https://quay.io/organization/travisci. I will try this.
Not sure if it's relevant in this case, but one way I found to debug/induce race conditions is to add a time.sleep(random.random()) into the middle of the TransformBlock definition.
The plot thickens: a moment ago, I reproduced the array sizing error locally on my MacBook. This error occurs on every execution on this machine, rather than ~1/2 the time.
FWIW I get the FFT failure sometimes on my machine.
Was it with the CPU version of Bifrost? The GPU one seems to work well for me regarding unit tests.
Update: I have finally gotten a local travis-ci instance up and running. All tests pass, every time. I still have not been able to reproduce these errors locally.
Closing this as tests seem to be stable now. I believe these issues were solved by a combination of fixing bugs and skipping flaky tests. A couple of the relevant commits: https://github.com/ledatelescope/bifrost/commit/a5448da899fc56eb891f09d61b7498d96da8cb6f https://github.com/ledatelescope/bifrost/commit/fa6b40b09b6014ef446147111e3edc2fafed1724
I'm seeing the following issue in Travis builds about 1/2 of the time. I can't reproduce this locally.