salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 392 forks source link

Expose derived feature value in model insights #484

Closed TuanNguyen27 closed 4 years ago

TuanNguyen27 commented 4 years ago

Some of the derived features (or engineered features) have human-readable values that could be exposed inside modelInsights for downstream consumption.

codecov[bot] commented 4 years ago

Codecov Report

Merging #484 into master will increase coverage by 0.02%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #484      +/-   ##
==========================================
+ Coverage   87.01%   87.03%   +0.02%     
==========================================
  Files         345      345              
  Lines       11680    11683       +3     
  Branches      378      387       +9     
==========================================
+ Hits        10163    10168       +5     
+ Misses       1517     1515       -2     
Impacted Files Coverage Δ
...c/main/scala/com/salesforce/op/ModelInsights.scala 93.09% <100.00%> (+0.04%) :arrow_up:
...force/op/stages/impl/feature/OPMapVectorizer.scala 97.79% <100.00%> (+0.01%) :arrow_up:
...es/src/main/scala/com/salesforce/op/OpParams.scala 89.79% <0.00%> (+4.08%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6d90ba6...daa5e9f. Read the comment docs.

TuanNguyen27 commented 4 years ago

LGTM, are there other places where the descriptor value should be passed on?

I did a spot check, i think only OpMapVec was missing this.

TuanNguyen27 commented 4 years ago

Yes, what happened with the OpMapVec? Did we miss adding descriptorValue? Also, maybe add one test in ModelInsight when a descriptor value is being added.

There's this line and this line that check for non-empty derivedFeatureValue. However we don't have a test for derivedFeatureValue when it is derived from a Map feature. Is it okay if i just add a check for derivedFeatureValue when it comes from a Map feature ?

TuanNguyen27 commented 4 years ago

@leahmcguire @michaelweilsalesforce It's actually tricky to write a test for this change, because the existing tests drop numericMap as a feature...Do you have any suggestions ?