salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 392 forks source link

Standalone minimum variance estimator #463

Closed clin-projects closed 4 years ago

clin-projects commented 4 years ago

Related issues N/A

Describe the proposed solution Standalone unary estimator to perform a minimum variance filter on derived features. Move shared functionality out of SanityChecker into DerivedFeatureFilterUtils object

Describe alternatives you've considered

Alternative 1: Gated Params

Alternative 2: Minimal Wrapper Function

Additional context We have a need for a minimum variance filter in an unsupervised (i.e., label-less) setting. While SanityChecker already has a minimum variance filter, it is a BinaryEstimator and assumes a (response, features) pair as input

codecov[bot] commented 4 years ago

Codecov Report

Merging #463 into master will increase coverage by <.01%. The diff coverage is 92.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #463      +/-   ##
==========================================
+ Coverage   86.97%   86.98%   +<.01%     
==========================================
  Files         341      344       +3     
  Lines       11507    11576      +69     
  Branches      374      370       -4     
==========================================
+ Hits        10008    10069      +61     
- Misses       1499     1507       +8
Impacted Files Coverage Δ
...tages/impl/preparators/SanityCheckerMetadata.scala 88.97% <0%> (-0.73%) :arrow_down:
...rce/op/stages/impl/preparators/SanityChecker.scala 91.48% <100%> (-0.11%) :arrow_down:
...cala/com/salesforce/op/dsl/RichVectorFeature.scala 72.72% <100%> (+6.06%) :arrow_up:
...op/stages/impl/preparators/MinVarianceFilter.scala 91.3% <91.3%> (ø)
...s/impl/preparators/MinVarianceFilterMetadata.scala 91.66% <91.66%> (ø)
...s/impl/preparators/DerivedFeatureFilterUtils.scala 92.76% <92.76%> (ø)
...es/src/main/scala/com/salesforce/op/OpParams.scala 85.71% <0%> (-4.09%) :arrow_down:
.../op/features/types/FeatureTypeSparkConverter.scala 98.24% <0%> (-0.88%) :arrow_down:
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fbad63d...c92d1cc. Read the comment docs.

clin-projects commented 4 years ago

@leahmcguire @tovbinm All comments addressed. Let me know if further changes are needed, or whether we can merge, thanks!!!

tovbinm commented 4 years ago

Great work!