H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Execute the following codes in Python (or R if you insist):
nrow = 10000 ncol = 100 seed=12345 frac1 = 0.16 frac2 = 0.2 f1 = h2o.create_frame(rows=nrow, cols=ncol, real_fraction=frac1, categorical_fraction=frac1, integer_fraction=frac1, binary_fraction=frac1, time_fraction=frac1, string_fraction=frac2, missing_fraction=0.1, has_response=False, seed=seed) f2 = h2o.create_frame(rows=nrow, cols=1, real_fraction=frac1, categorical_fraction=frac1, integer_fraction=frac1, binary_fraction=frac1, time_fraction=frac1, string_fraction=frac2, missing_fraction=0.1, has_response=False, seed=seed) f3 = f2.cbind(f1) f3.names
You will find the column names are duplicated: <type 'list'>: [u'C1', u'C10', u'C2', u'C3', u'C4', u'C5', u'C6', u'C7', u'C8', u'C9', u'C10', u'C11', u'C12', u'C13', u'C14', u'C15', u'C16', u'C17', u'C18', u'C19', u'C20', u'C21', u'C22', u'C23', u'C24', u'C25', u'C26', u'C27', u'C28', u'C29', u'C30', u'C31', u'C32', u'C33', u'C34', u'C35', u'C36', u'C37', u'C38', u'C39', u'C40', u'C41', u'C42', u'C43', u'C44', u'C45', u'C46', u'C47', u'C48', u'C49', u'C50', u'C51', u'C52', u'C53', u'C54', u'C55', u'C56', u'C57', u'C58', u'C59', u'C60', u'C61', u'C62', u'C63', u'C64', u'C65', u'C66', u'C67', u'C68', u'C69', u'C70', u'C71', u'C72', u'C73', u'C74', u'C75', u'C76', u'C77', u'C78', u'C79', u'C80', u'C81', u'C82', u'C83', u'C84', u'C85', u'C86', u'C87', u'C88', u'C89', u'C90', u'C91', u'C92', u'C93', u'C94', u'C95', u'C96', u'C97', u'C98', u'C99', u'C100']
We have C10 appearing in two places.