Closed neubig closed 1 year ago
The cause of randomness may be that we didn't fix the random seed.
I think a possible fix is:
NumPy provides SeedSequence
functionality which allows us to hierarchically initialize the random seeds over the whole library. I think it would be better that every class that calculates something has SeedSequence argument in __init__
.
That seems reasonable to me.
I'm seeing this failures with a high probability. It's time consuming to re-run integration tests just to make PR green. Is someone already working on the fix?
We have a temporary fix of making the test more lenient that is mixed in with this PR: https://github.com/neulab/ExplainaBoard/pull/532
It might be worth pulling out just that fix and making it a separate PR.
@neubig I think we should create a separate PR to fix this flaky test as soon as possible, given that the PR is still in review.
I agree. I'm not able to do it now, but welcome anyone else who can do it
@neubig I'll create a PR to cherry-pick the change in the test in #532. I will add comments to #532, too.
Plan to apply the change:
seed
argument in Metric.__init__
, which is defaulting to None
. If seed
is None, it is initialized randomly.Metric.get_seed
to return the seed that the object holds.get_seed
.Assigning to me as per the offline discussion with @odashi.
We observed the following test failure when integrating another PR:
We are not sure whether this is an issue with the test or the underlying code, but as a temporary measure we reduced the sensitivity of the test. We should go back and check to make sure whether this is just due to bootstrapping variance or whether it's due to a bug in the test itself.