h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

h2oR summary: displaying no labels in summary #13990

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

On master H2O Build git hash 2ed208c6adbae3864b7cdbc410655a3b3c2f0dd3

ntr = h2o.uploadFile(h,"/Users/nidhimehta/Desktop/german/trn.csv", destination_frame="ntr")

summary(ntr) C1 C2 C3 C4 C5 C6 C7 C8 C9
:273 Min. : 4.00 :376 :197 Min. : 276 :427 :244 Min. :1.000 :386 :197 1st Qu.:12.00 :200 :157 1st Qu.: 1342 :120 :168 1st Qu.:2.000 :216 :183 Median :18.00 : 66 :131 Median : 2250 : 77 :126 Median :3.000 : 64 : 47 Mean :20.65 : 30 : 68 Mean : 3182 : 42 :118 Mean :2.973 : 34 : NA 3rd Qu.:24.00 : 28 : 65 3rd Qu.: 3911 : 34 : 44 3rd Qu.:4.000 : NA : NA Max. :72.00 : NA : 40 Max. :15945 : NA : NA Max. :4.000 : NA C10 C11 C12 C13 C14 C15 C16 C17
:633 Min. :1.000 :238 Min. :19.00 :570 :503 Min. :1.000 :445 : 37 1st Qu.:2.000 :199 1st Qu.:27.00 : 98 :119 1st Qu.:1.000 :137 : 30 Median :3.000 :154 Median :33.00 : 32 : 78 Median :1.000 :105 : NA Mean :2.809 :109 Mean :35.32 : NA : NA Mean :1.393 : 13 : NA 3rd Qu.:4.000 : NA 3rd Qu.:41.00 : NA : NA 3rd Qu.:2.000 : NA : NA Max. :4.000 : NA Max. :75.00 : NA : NA Max. :4.000 : NA C18 C19 C20 C21
Min. :1.000 :422 :674 Min. :1.000
1st Qu.:1.000 :278 : 26 1st Qu.:1.000
Median :1.000 : NA : NA Median :1.000
Mean :1.149 : NA : NA Mean :1.296
3rd Qu.:1.000 : NA : NA 3rd Qu.:2.000
Max. :2.000 : NA : NA Max. :2.000

Expected - otrn = read.csv(file="/Users/nidhimehta/Desktop/german/trn.csv",header=T) summary(otrn) C1 C2 C3 C4 C5 C6 C7
A11:183 Min. : 4.00 A30: 28 A43 :197 Min. : 276 A61:427 A71: 44
A12:197 1st Qu.:12.00 A31: 30 A40 :157 1st Qu.: 1351 A62: 77 A72:118
A13: 47 Median :18.00 A32:376 A42 :131 Median : 2253 A63: 42 A73:244
A14:273 Mean :20.65 A33: 66 A49 : 68 Mean : 3182 A64: 34 A74:126
3rd Qu.:24.00 A34:200 A41 : 65 3rd Qu.: 3913 A65:120 A75:168
Max. :72.00 A46 : 40 Max. :15945
(Other): 42
C8 C9 C10 C11 C12 C13
Min. :1.000 A91: 34 A101:633 Min. :1.000 A121:199 Min. :19.00
1st Qu.:2.000 A92:216 A102: 30 1st Qu.:2.000 A122:154 1st Qu.:27.00
Median :3.000 A93:386 A103: 37 Median :3.000 A123:238 Median :33.00
Mean :2.973 A94: 64 Mean :2.809 A124:109 Mean :35.32
3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:41.00
Max. :4.000 Max. :4.000 Max. :75.00

C14 C15 C16 C17 C18 C19
A141: 98 A151:119 Min. :1.000 A171: 13 Min. :1.000 A191:422
A142: 32 A152:503 1st Qu.:1.000 A172:137 1st Qu.:1.000 A192:278
A143:570 A153: 78 Median :1.000 A173:445 Median :1.000
Mean :1.393 A174:105 Mean :1.149
3rd Qu.:2.000 3rd Qu.:1.000
Max. :4.000 Max. :2.000

C20 C21
A201:674 Min. :1.000
A202: 26 1st Qu.:1.000
Median :1.000
Mean :1.296
3rd Qu.:2.000
Max. :2.000

exalate-issue-sync[bot] commented 1 year ago

Spencer Aiello commented: just pushed to my branch, will close when shifting to master.

Added the ability to set number of factors you'd like out of the summary:

summary(fr, factors=50) will attempt to retrieve the 50 most frequent factors from all factor columns while doing the summary. (default is 6...)

Example with airlines:

summary(fr) Year Month DayofMonth DayOfWeek
1 Min. :1987 Min. : 1.000 Min. : 1.0 Min. :1.000
2 1st Qu.:1992 1st Qu.: 1.000 1st Qu.: 6.0 1st Qu.:2.000
3 Median :1998 Median : 1.000 Median :14.0 Median :4.000
4 Mean :1998 Mean : 1.409 Mean :14.6 Mean :3.821
5 3rd Qu.:2003 3rd Qu.: 1.000 3rd Qu.:23.0 3rd Qu.:5.000
6 Max. :2008 Max. :10.000 Max. :31.0 Max. :7.000
7
DepTime CRSDepTime ArrTime CRSArrTime
1 Min. : 1.0 Min. : 0.0 Min. : 1 Min. : 0
2 1st Qu.: 927.4 1st Qu.: 908.6 1st Qu.:1117 1st Qu.:1107
3 Median :1328.2 Median :1319.2 Median :1525 Median :1515
4 Mean :1345.8 Mean :1313.2 Mean :1505 Mean :1485
5 3rd Qu.:1733.8 3rd Qu.:1718.1 3rd Qu.:1916 3rd Qu.:1902
6 Max. :2400.0 Max. :2359.0 Max. :2400 Max. :2359
7 NA's :1086 NA's :1195
UniqueCarrier FlightNum TailNum ActualElapsedTime 1 US:18729 Min. : 1.0 UNKNOW : 179 Min. : 16.0
2 UA: 9434 1st Qu.: 202.4 000000 : 124 1st Qu.: 71.0
3 WN: 6170 Median : 553.9 �NKNO�: 114 Median :101.0
4 HP: 3451 Mean : 818.8 0 : 66 Mean :124.8
5 PS: 3212 3rd Qu.:1241.0 N912UA : 59 3rd Qu.:151.0
6 DL: 935 Max. :3949.0 N316AW : 56 Max. :475.0
7 NA :16024 NA's :1195
CRSElapsedTime AirTime ArrDelay DepDelay
1 Min. : 17 Min. : 14.0 Min. :-63.000 Min. :-16.00
2 1st Qu.: 71 1st Qu.: 61.0 1st Qu.: -6.000 1st Qu.: -2.00
3 Median :102 Median : 91.0 Median : 2.000 Median : 1.00
4 Mean :125 Mean :114.3 Mean : 9.317 Mean : 10.01
5 3rd Qu.:151 3rd Qu.:140.0 3rd Qu.: 14.000 3rd Qu.: 10.00
6 Max. :437 Max. :402.0 Max. :475.000 Max. :473.00
7 NA's :13 NA's :16649 NA's :1195 NA's :1086
Origin Dest Distance TaxiIn
1 DEN:3558 PHX:9317 Min. : 11.0 Min. : 0.000
2 PIT:3241 PHL:4482 1st Qu.: 323.0 1st Qu.: 3.000
3 ORD:2246 PIT:3020 Median : 537.7 Median : 5.000
4 BUR:2021 ORD:2103 Mean : 730.2 Mean : 5.381
5 CLT:1781 CLT:1542 3rd Qu.: 916.9 3rd Qu.: 6.000
6 PHL:1632 DEN:1470 Max. :3365.0 Max. :128.000
7 NA's :35 NA's :16026
TaxiOut Cancelled CancellationCode Diverted
1 Min. : 0.00 Min. :0.00000 B : 93 Min. :0.000000
2 1st Qu.: 9.00 1st Qu.:0.00000 A : 81 1st Qu.:0.000000
3 Median : 12.00 Median :0.00000 C : 47 Median :0.000000
4 Mean : 14.17 Mean :0.02469 NA:43757 Mean :0.002479
5 3rd Qu.: 16.00 3rd Qu.:0.00000 3rd Qu.:0.000000
6 Max. :254.00 Max. :1.00000 Max. :1.000000
7 NA's :16024
CarrierDelay WeatherDelay NASDelay SecurityDelay
1 Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.00000
2 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.00000
3 Median : 0.000 Median : 0.0000 Median : 0.000 Median : 0.00000
4 Mean : 4.048 Mean : 0.2894 Mean : 4.855 Mean : 0.01702
5 3rd Qu.: 0.000 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.: 0.00000
6 Max. :369.000 Max. :201.0000 Max. :323.000 Max. :14.00000
7 NA's :35045 NA's :35045 NA's :35045 NA's :35045
LateAircraftDelay IsArrDelayed IsDepDelayed
1 Min. : 0.00 YES:24441 YES:23091 2 1st Qu.: 0.00 NO :19537 NO :20887 3 Median : 0.00
4 Mean : 7.62
5 3rd Qu.: 0.00
6 Max. :373.00
7 NA's :35045

SET FACTORS TO MORE THAN 6:

summary(fr[,"TailNum"], factors=10) TailNum 1 UNKNOW : 179 2 000000 : 124 3 �NKNO�: 114 4 0 : 66 5 N912UA : 59 6 N316AW : 56 7 N509DC : 55 8 N922UA : 54 9 N913UA : 53 10 N160AW : 51 11 NA :16024

DinukaH2O commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-1008 Assignee: Spencer Aiello Reporter: Nidhi Mehta State: Resolved Fix Version: N/A Attachments: N/A Development PRs: N/A