h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.88k stars 1.99k forks source link

10bn row result has more-than-just-by-chance same values in X1 and Y1 #9866

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Do the 10bn row tests with the given seeds via CreateFrame and then select random rows from the result. X1 and X2 sometimes contain the same value which looks suspicious. This didn't happen when the data was loaded from file, so possibly something odd with CreateFrame or the randomness (which is supposed to be PCG in H2O like I plugged in at C level to create the files, but I should look and check.)

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-2937 Assignee: Matt Dowle Reporter: Matt Dowle State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A