Open ghost opened 1 year ago
Do you mean it happened in test_mcdta.py?
No. This one on the network_7link
fixture:
Will the pull request #61 help resolve this issue?
Unfortunately no. See https://github.com/maccmu/macposts/actions/runs/8738763208/job/23978496228#step:6:20
If you have a macOS environment (I do not), it might be better to check there.
I found these sources
https://stefanoborini.com/c-stdlib-stdsrandstdrand-are-not-repeatable-across-platforms/ https://stackoverflow.com/questions/64680033/rand-behaves-differently-between-macos-and-linux
Perhaps we can use random instead of cstdlib? Just a thought.
I am not sure if this is really the cause -- only one of the four tests fails which is just too deterministic if it is caused by rand
. The symptom here is quite similar to the one fixed by e481254. So I think it might be due to differences in undefined/implementation-defined behaviors in the standard library (libc++ on macOS vs libstdc++ on GNU/Linux), or some use-after-use or uninitialized memory blocks.
Anyway, I do agree that it is a good idea to avoid rand
, which is often of low quality. I was considering a custom PCG generator, which is easy to implement and fast, and its quality should suffice for our use cases. Using <random>
is also good because of the better integration with STL. However, please be sure to use and maintain some global random state as in https://github.com/maccmu/macposts/blob/main/macposts/_ext/utils.h, so that we can explicitly control it.
I tested this network_7link
with test_dta.py
on the macOS. I printed out all in_ccs
of all runs. Interestingly, their number of rows can differ, which seems to suggest that the DNL runs end at different intervals since total_interval = -1
.
Just a quick question, in testing multiclass cases, why do we only compare the first rows of the CC, instead of the entire CC matrices as in single-class cases? The first row means the CC at interval 0, which are usually 0s.
If I remove [1]
here, of all four tests, only network_3link
can succeed on the macOS. I didn't test them on other platforms.
It was an oversight. Corrected in 4a93e403434cc8eaaae0a890ab9fc89f58ca4fba. Thanks for catching this!
It is actually a severer issue than I thought, and perhaps it has been there since 046f43b45872e708ff86cef009e8a0ebd20c36df. (I can confirm that before the merging all of the four tests worked on GNU/Linux at least.)
I will have a look at this now that I can observe the failures in my local environment.
In the single-class network_7link
case, if I change adaptive_ratio
to 0 or 1, it can succeed. So it might have some issues with the hybrid routing.
Commit 85eeb31f79f198918b7d5595a5c101d3141212f5 fixed this on Linux and Windows. On macOS three tests still fail. I do not have a macOS environment. Maybe you could have a look at it?
Still no good on macOS. I will take a look.
Update: They actually fail on all platforms. See the output from(Fixed on Linux and Windows platforms in 85eeb31f79f198918b7d5595a5c101d3141212f5.)pytest -k 'repro' --runxfail
.Currently we mark reproducibility tests as "xfail" on macOS. See: https://github.com/maccmu/macposts/blob/9148b06406980384b41f36e96021c7f66add7bed/tests/test_dta.py#L8-L11
Currently on macOS: