Closed Jauntbox closed 4 years ago
Merging #510 into master will decrease coverage by
0.01%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #510 +/- ##
==========================================
- Coverage 86.72% 86.70% -0.02%
==========================================
Files 347 347
Lines 11860 11830 -30
Branches 386 382 -4
==========================================
- Hits 10285 10257 -28
+ Misses 1575 1573 -2
Impacted Files | Coverage Δ | |
---|---|---|
...p/evaluators/OpBinaryClassificationEvaluator.scala | 82.60% <100.00%> (ø) |
|
...aluation/ExtendedBinaryClassificationMetrics.scala | 100.00% <100.00%> (ø) |
|
...es/src/main/scala/com/salesforce/op/OpParams.scala | 85.71% <0.00%> (-4.09%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update be60275...8a05fcc. Read the comment docs.
Strong +1 for using reflection in the interim instead of copy&pasting. And let us make a PR to Spark
Related issues N/A
Describe the proposed solution The initial thresholded confusion matrix for binary classification metrics was added in https://github.com/salesforce/TransmogrifAI/pull/492, through selective overriding of Spark's BinaryClassificationMetrics class to expose only the thresholded confusion matrices. To do this, the calculation had to be done a second time.
Unfortunately, this calculation is non-deterministic. I haven't figured out exactly which piece, but I'm worried about the partitioning since a mapPartitions is used. This means that we need to go back to the original plan of copying the entire Spark class, since that's the only way to guarantee that the confusion matrices agrees with the rest of the thresholded metrics.
Describe alternatives you've considered Reimplementing the other thresholded metrics amounts to copy/pasting the entire class so isn't really an alternative.
Additional context N/A