Open exalate-issue-sync[bot] opened 1 year ago
Arno Candel commented: The thresholds metrics table with all the metrics for every threshold is also available from R, mymodel@model$training_metrics@metrics$tresholds_and_metric_scores.
The idx column is used by various helpers to access the corresponding CMs for a given threshold (mapped via the idx), so it has to stay in the model output.
Whether the idx should be printed in the show() method for a model, is a different question. Maybe we can remove that column in show(), and only show it during summary(model), but that's a micro-optimization at this point IMO.
Nidhi Mehta commented: If the object, idx is referencing to, is not displayed in the model_object, then it should be removed from it. So idx column should be removed from the model object or a reference given where to find the whole giant threshold metric table i.e mymodel@model$training_metrics@metrics$tresholds_and_metric_scores
In my opinion - mymodel@model$training_metrics@metrics$tresholds_and_metric_scores. -is an absurdly long call. A separate function or value would be more useful.
Neeraja Madabhushi commented: rpeck thinks it's fine to show.
Nidhi Mehta commented: I don't think it is fine to show without a refernce in the model object. Build a model in R and print a model object and you see this idx cloumn and you wonder what does it stand for?
Arno Candel commented: It's very difficult to not show without a special-case hack in the R/python clients. We would need to add per-column importance levels that all clients would respect - just for the show() method.
The problem is that R needs that field for displaying all the 10-15 most important CMs (per max-metric threshold), and the advanced user might want to see those idx values as well, just not per default...
Raymond Peck commented: I'll leave this in backlog and we can revisit later. At some point we'll probably need a "don't display this column" notion in the TwoDimTable.
JIRA Issue Migration Info
Jira Issue: PUBDEV-979 Assignee: Arno Candel Reporter: Nidhi Mehta State: Reopened Fix Version: N/A Attachments: N/A Development PRs: N/A
build any binary response model in h2o and print the model object. There is column 'idx' in max metric table, that references the threshold metrics in Flow and has no reference/use for an R user and should be removed.
Maximum Metrics: metric threshold value idx 1 f1 0.295569 0.572772 205 2 f2 0.091965 0.767978 253 3 f0point5 0.384837 0.478801 144 4 accuracy 0.510589 0.606192 58 5 precision 0.840633 0.625850 0 6 absolute_MCC 0.405469 0.122578 128 7 min_per_class_accuracy 0.401394 0.559573 131 8 tns 0.840633 27501840.000000 0 9 fns 0.840633 18205435.000000 0 10 fps 0.066128 27502115.000000 255 11 tps 0.066128 18205895.000000 255 12 tnr 0.840633 0.999990 0 13 fnr 0.840633 0.999975 0 14 fpr 0.066128 1.000000 255 15 tpr 0.066128 1.000000 255