Closed Jauntbox closed 4 years ago
Merging #501 into master will increase coverage by
1.48%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #501 +/- ##
==========================================
+ Coverage 80.02% 81.51% +1.48%
==========================================
Files 346 346
Lines 11782 11782
Branches 385 385
==========================================
+ Hits 9429 9604 +175
+ Misses 2353 2178 -175
Impacted Files | Coverage Δ | |
---|---|---|
...ala/com/salesforce/op/utils/tuples/RichTuple.scala | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
...alesforce/op/aggregators/TimeBasedAggregator.scala | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
...stages/impl/feature/TimePeriodMapTransformer.scala | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
...e/op/stages/impl/insights/RecordInsightsCorr.scala | 0.00% <0.00%> (-98.25%) |
:arrow_down: |
utils/src/main/scala/com/salesforce/op/UID.scala | 0.00% <0.00%> (-91.67%) |
:arrow_down: |
...op/stages/impl/preparators/MinVarianceFilter.scala | 0.00% <0.00%> (-91.31%) |
:arrow_down: |
...es/src/main/scala/com/salesforce/op/OpParams.scala | 0.00% <0.00%> (-85.72%) |
:arrow_down: |
...ala/com/salesforce/op/stages/SparkStageParam.scala | 0.00% <0.00%> (-77.42%) |
:arrow_down: |
...a/com/salesforce/op/utils/spark/RichMetadata.scala | 15.78% <0.00%> (-73.69%) |
:arrow_down: |
...la/com/salesforce/op/utils/spark/RichDataset.scala | 15.38% <0.00%> (-70.77%) |
:arrow_down: |
... and 103 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update f5aef4f...30cd461. Read the comment docs.
Oof, didn't realize you weren't a "code owner", I guess @leahmcguire needs to approve this too
Thanks for the contribution! It looks like @Jauntbox is an internal user so signing the CLA is not required. However, we need to confirm this.
Related issues N/A
Describe the proposed solution The current local scoring tests are flaky when xgboost models are included because the dataset used is a tiny 8-row hardcoded dataset. This can cause the train/validation splits to often contain all of a single class which causes xgboost models to throw an error.
This PR makes the dataset used a synthetic dataset of adjustable size (now 100 rows) to fix this problem.
Describe alternatives you've considered We could also have used the full hardcoded Titanic dataset, but this was much easier for me.
Additional context N/A