salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 392 forks source link

loco test bug fix #423

Open AdamChit opened 4 years ago

AdamChit commented 4 years ago

Related issues test refactor made in https://github.com/salesforce/TransmogrifAI/pull/412 causes one of the test cases to fail.

Specially this test assumed that at-least 1 of the 4 features would be selected https://github.com/salesforce/TransmogrifAI/blob/master/core/src/test/scala/com/salesforce/op/stages/impl/insights/RecordInsightsLOCOTest.scala#L236

Describe the proposed solution Test should allow the case where none of the features are picked

Additional context example of the test failing:

Screen Shot 2019-10-15 at 9 24 36 PM
codecov[bot] commented 4 years ago

Codecov Report

Merging #423 (b4a2116) into master (ac83ad7) will decrease coverage by 60.42%. The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #423       +/-   ##
===========================================
- Coverage   86.94%   26.52%   -60.43%     
===========================================
  Files         340      337        -3     
  Lines       11388    11131      -257     
  Branches      363      593      +230     
===========================================
- Hits         9901     2952     -6949     
- Misses       1487     8179     +6692     
Impacted Files Coverage Δ
...main/scala/com/salesforce/op/dsl/RichFeature.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...main/scala/com/salesforce/op/filters/Summary.scala 0.00% <0.00%> (-100.00%) :arrow_down:
.../scala/com/salesforce/op/cli/gen/ProblemKind.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...n/scala/com/salesforce/op/dsl/RichSetFeature.scala 0.00% <0.00%> (-100.00%) :arrow_down:
.../scala/com/salesforce/op/dsl/RichListFeature.scala 0.00% <0.00%> (-100.00%) :arrow_down:
.../scala/com/salesforce/op/stages/impl/package.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...cala/com/salesforce/op/cli/gen/FileInProject.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...scala/org/apache/spark/util/SparkThreadUtils.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...cala/com/salesforce/op/OpWorkflowModelWriter.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...cala/com/salesforce/op/evaluators/Evaluators.scala 0.00% <0.00%> (-100.00%) :arrow_down:
... and 201 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ac83ad7...b6829fe. Read the comment docs.

AdamChit commented 4 years ago

Why would it not pick any features?

So in the test the labels are generated based on the PickList feature:

        p.value match {
          case Some("A") | Some("B") | Some("C") => RealNN(1.0)
          case _ => RealNN(0.0)
        }
      )

And all the features have a chance of being empty:

      val countryData: Seq[Country] = RandomText.countries.withProbabilityOfEmpty(0.3).take(numRows).toList
      val pickListData: Seq[PickList] = RandomText.pickLists(domain = List("A", "B", "C", "D", "E", "F", "G"))
        .withProbabilityOfEmpty(0.1).limit(numRows)
      val currencyData: Seq[Currency] = RandomReal.logNormal[Currency](mean = 10.0, sigma = 1.0)
        .withProbabilityOfEmpty(0.3).limit(numRows)

One case that could happen: 2) All the feature are set to empty 3) LOCO changes the pickList from empty to non-empty and because it is not in the set {A,B,C} it is NOT a strong of a predictor => label wouldn't change => pickList would not have a strong insight => No feature is selected

Note: the chance of this happening is very low which explains why it doesn't always appear in the CI build.

tovbinm commented 4 years ago

@AdamChit are you still working on adding a more robust test?

tovbinm commented 4 years ago

any updates? @AdamChit