Closed swager closed 1 month ago
Hmm, yes, I think all these pseudorandom generators operate with modular arithmetic so the fractional part of a number will never make a difference.
I see how doing runif
by default is maybe a tiny bit unidiomatic from that perspective, but we can't change the default now as that will be a breaking change? I.e, set.seed(42); causal_forest(X,Y,Z)
, will currently produce the same result in all grf versions (ignoring the num.threads bit).
Are you worrying about the chance that runif will cause a rounding collision where you expect to get different result but don't because:
set.seed(something)
runif(1, 0, .Machine$integer.max)) happens to be 42.2
causal_forest(....)
set.seed(something else)
runif(1, 0, .Machine$integer.max)) happens to be 42.4
causal_forest(...) <- same output as above
Since the seed is runif between 0 and .Machine$integer.max that seems very unlikely, or is there another potential bug/unwanted behavior you had in mind?
I completely forgot to paste this, Base R also interprets all seeds as integers:
> set.seed(1)
> runif(1)
[1] 0.2655087
> set.seed(1.4)
> runif(1)
[1] 0.2655087
By default we give non-integer seeds to forests, e.g.:
https://github.com/grf-labs/grf/blob/5a2fbcd51bdeac67cbb85d2407a282d1962550e9/r-package/grf/R/regression_forest.R#L109C31-L109C71
It looks like this doesn't actually matter (i.e., the seed is in fact rounded down and it's OK), but still seems like explicitly using an integer seed ourselves could help avoid a future bug.