salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 393 forks source link

added call of validation prepare before model selection when no dag i… #424

Closed leahmcguire closed 5 years ago

leahmcguire commented 5 years ago

…s passed

Related issues Refer to issue(s) addressed in this pull request from Issues page.

Describe the proposed solution A clear and concise description of what the changes are.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context about the changes here.

tovbinm commented 5 years ago

Is there a way to test it? Should a user be able to disable it if needed?

codecov[bot] commented 5 years ago

Codecov Report

Merging #424 into master will decrease coverage by 0.03%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #424      +/-   ##
==========================================
- Coverage   86.99%   86.95%   -0.04%     
==========================================
  Files         337      337              
  Lines       11078    11078              
  Branches      369      597     +228     
==========================================
- Hits         9637     9633       -4     
- Misses       1441     1445       +4
Impacted Files Coverage Δ
...op/stages/impl/tuning/OpTrainValidationSplit.scala 100% <ø> (ø) :arrow_up:
...orce/op/stages/impl/tuning/OpCrossValidation.scala 97.95% <ø> (ø) :arrow_up:
...cala/com/salesforce/op/cli/gen/ProblemSchema.scala 91.37% <0%> (-5.18%) :arrow_down:
...in/scala/com/salesforce/op/cli/gen/AvroField.scala 74.35% <0%> (-2.57%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5d38090...eeb0e1e. Read the comment docs.

leahmcguire commented 5 years ago

You disable by not having a splitter. And it is not really testable in our current config since it was a bug in the internals of the validation model selection which may or may not effect which model wins but would not effect the final trained model since the selected model did have this stage applied correctly

gerashegalov commented 5 years ago

Oh this is what I was looking in #246 but then it turned out a non-issue in my investigation. We need indeed this covered by tests (re @tovbinm) given the splitter lifecycle issues we have faced.

salesforce-cla[bot] commented 4 years ago

Thanks for the contribution! It looks like @leahmcguire is an internal user so signing the CLA is not required. However, we need to confirm this.

salesforce-cla[bot] commented 3 years ago

Thanks for the contribution! Unfortunately we can't verify the commit author(s): leahmcguire l***@s***.com. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request.