Closed leahmcguire closed 5 years ago
Merging #404 into master will decrease coverage by
0.01%
. The diff coverage is100%
.
@@ Coverage Diff @@
## master #404 +/- ##
==========================================
- Coverage 86.96% 86.95% -0.02%
==========================================
Files 337 337
Lines 11054 11060 +6
Branches 361 591 +230
==========================================
+ Hits 9613 9617 +4
- Misses 1441 1443 +2
Impacted Files | Coverage Δ | |
---|---|---|
...orce/op/stages/impl/tuning/OpCrossValidation.scala | 97.95% <100%> (+0.28%) |
:arrow_up: |
...es/src/main/scala/com/salesforce/op/OpParams.scala | 85.71% <0%> (-4.09%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 95a77b1...6687a34. Read the comment docs.
@tovbinm @gerashegalov I discovered some fun behavior using algebird implicit for maps:
val test = Seq(Map("a" -> 1, "b" -> 1), Map("a" -> 1, "b" -> 1)) val combined = test.reduce( + ) // Map("a" -> 2, "b" -> 2)) as expected
val test = Seq(Map("a" -> 0, "b" -> 0), Map("a" -> 0, "b" -> 0)) val combined = test.reduce( + ) // Map() :-(
Think I may need to write out the reduce function...
@leahmcguire Map behavior is as expected under monoid rules. In order to keep the map keys with 0 values use semigroup for map values. Example for Map[String, Long]
below:
import com.twitter.algebird._
import com.twitter.algebird.Operators._
implicit val longSemigroup = Semigroup.from[Long](_ + _)
implicit val mapLongMonoid = Monoid.mapMonoid[String, Long](longSemigroup)
// works!
(Map("a" -> 0L) + Map("b" -> 0L)) shouldBe Map("a" -> 0L, "b" -> 0L)
Related issues GLM optimizer is not very stable so sometimes runs will fail in some CV folds but not others - this was causing metrics and grids to not be computed correctly since we assumed that the same number of parameter grids successfully run in each cross validation fold
Describe the proposed solution throw out grids which are not in every cv run (since they are likely to fail in some modeling runs)
Describe alternatives you've considered use grids with metrics in a map to make sure that the correct metrics are combined in CV and simply take the best metric
Additional context Add any other context about the changes here.