Jauntbox commented 4 years ago

Related issues N/A

Describe the proposed solution The current local scoring tests are flaky when xgboost models are included because the dataset used is a tiny 8-row hardcoded dataset. This can cause the train/validation splits to often contain all of a single class which causes xgboost models to throw an error.

This PR makes the dataset used a synthetic dataset of adjustable size (now 100 rows) to fix this problem.

Describe alternatives you've considered We could also have used the full hardcoded Titanic dataset, but this was much easier for me.

Additional context N/A

codecov[bot] commented 4 years ago

Codecov Report

Merging #501 into master will increase coverage by 1.48%. The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #501      +/-   ##
==========================================
+ Coverage   80.02%   81.51%   +1.48%     
==========================================
  Files         346      346              
  Lines       11782    11782              
  Branches      385      385              
==========================================
+ Hits         9429     9604     +175     
+ Misses       2353     2178     -175

Impacted Files	Coverage Δ
...ala/com/salesforce/op/utils/tuples/RichTuple.scala	`0.00% <0.00%> (-100.00%)`	:arrow_down:
...alesforce/op/aggregators/TimeBasedAggregator.scala	`0.00% <0.00%> (-100.00%)`	:arrow_down:
...stages/impl/feature/TimePeriodMapTransformer.scala	`0.00% <0.00%> (-100.00%)`	:arrow_down:
...e/op/stages/impl/insights/RecordInsightsCorr.scala	`0.00% <0.00%> (-98.25%)`	:arrow_down:
utils/src/main/scala/com/salesforce/op/UID.scala	`0.00% <0.00%> (-91.67%)`	:arrow_down:
...op/stages/impl/preparators/MinVarianceFilter.scala	`0.00% <0.00%> (-91.31%)`	:arrow_down:
...es/src/main/scala/com/salesforce/op/OpParams.scala	`0.00% <0.00%> (-85.72%)`	:arrow_down:
...ala/com/salesforce/op/stages/SparkStageParam.scala	`0.00% <0.00%> (-77.42%)`	:arrow_down:
...a/com/salesforce/op/utils/spark/RichMetadata.scala	`15.78% <0.00%> (-73.69%)`	:arrow_down:
...la/com/salesforce/op/utils/spark/RichDataset.scala	`15.38% <0.00%> (-70.77%)`	:arrow_down:
... and 103 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f5aef4f...30cd461. Read the comment docs.

Jauntbox commented 4 years ago

Oof, didn't realize you weren't a "code owner", I guess @leahmcguire needs to approve this too

salesforce-cla[bot] commented 3 years ago

Thanks for the contribution! It looks like @Jauntbox is an internal user so signing the CLA is not required. However, we need to confirm this.

salesforce / TransmogrifAI

Refactor flaky local scoring tests #501

Codecov Report