You should remove or validate the accuracy benchmarks

Slater-Victoroff commented 7 years ago

Assuming accurate model creation and sufficient benchmarking there should be no accuracy delta between different libraries in creating different models. The results you've presented are all far below the threshold of showing any real difference between models, and I'd assert that they're well within noise parameters.

I'd suggest either removing them or confirming them, but given the size of the differences you're reporting on the chance of them being statistically significant is very close to zero.

szilard commented 7 years ago

I did not claim statistically significant difference in accuracy, except for Spark's RF/GBM.

szilard commented 7 years ago

Oh, I thought (in my previous comment) you're commenting in the bench-ML repo.

Yeah, here I'm just checking for the accuracy to be ~1% (say definitely less than 2%) just as to check if something is not totally wrong. The dataset etc is too small to establish any statistically significant difference (and also it could be different on another dataset).

szilard / benchm-dl

You should remove or validate the accuracy benchmarks #8