Open suzannejin opened 7 months ago
This will be checked on #73 a similar solution to shuffling #70 can be done (testing a first pipeline run, saving results and checking that further pipeline runs show the same results, this can also be done with nf-tests I believe).
liked to #40
PR #166 is setting the basis for testing reproducibility. Throught the debug mode. The point is that this issue is much bigger than just checking if output are identical. Because how close to reproducible you are likely depends on the ammount of data, tha size of the model, how long until convergence in learning and the complexity of the problem.
random sampling There are many random sampling methods, including random.sample, and other low level within library sampling. Setting
random.seed(0)
at the very beginning of a script won't work.set operations Sets are unordered, consequently everything handled with sets are not gonna follow a certain order, and this is not controllable. However, set operations are very efficient.
Alternatives?