Closed MichaelChirico closed 3 years ago
Afair DT does not do double pass, just R does. It make sense to use an ordinary mean, and if it is not available, then IMO it should be made available.
I see, I think I was mistaken in thinking GForce is doing double pass. But found this open issue:
You can do a 2 pass algorithm using rollmean with n equal to length of x, and algo exact :)
I am looking at the benchmark and the performance of
spark
on a task involving only mean aggregation stands out -- is Spark by chance not doing the error correction double-pass that's done in R anddata.table
as well?If so that would seem to give an (IMO) unfair advantage to tools that give numerically inferior results.
At the least this could be pointed out somewhere (I don't see it mentioned anywhere in the repo thus far).