salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 392 forks source link

Fix OpWorkflowModelLocalTest due to flaky XGBoost training #494

Closed TuanNguyen27 closed 4 years ago

TuanNguyen27 commented 4 years ago

Source of flakiness: default BinaryClassificationModelSelector.withTrainValidationSplit sometimes makes the training set contain only positive or negative labels, which fails the training for xgboost.

We address this flakiness by fixing the seed in the DataSplitter for withTrainValidationSplit, which will result in the same train-test split every time the test is run.

ml.dmlc.xgboost4j.java.XGBoostError: [16:55:13] /xgboost/src/metric/rank_metric.cc:515: Check failed: !auc_error: AUC-PR: the dataset only contains pos or neg samples
codecov[bot] commented 4 years ago

Codecov Report

Merging #494 into master will decrease coverage by 3.80%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #494      +/-   ##
==========================================
- Coverage   82.63%   78.83%   -3.81%     
==========================================
  Files         345      345              
  Lines       11702    11702              
  Branches      388      388              
==========================================
- Hits         9670     9225     -445     
- Misses       2032     2477     +445     
Impacted Files Coverage Δ
...scala/com/salesforce/op/utils/text/TextUtils.scala 0.00% <0.00%> (-100.00%) :arrow_down:
.../scala/com/salesforce/op/test/FeatureAsserts.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...ala/com/salesforce/op/readers/CSVAutoReaders.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...la/com/salesforce/op/test/TestFeatureBuilder.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...om/salesforce/op/stages/impl/feature/OpNGram.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...alesforce/op/stages/impl/feature/OpHashingTF.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...lesforce/op/stages/impl/feature/LangDetector.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...sforce/op/aggregators/CustomMonoidAggregator.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...sforce/op/stages/base/binary/BinaryEstimator.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...e/op/stages/impl/feature/TextMapLenEstimator.scala 0.00% <0.00%> (-100.00%) :arrow_down:
... and 111 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f764842...2313d09. Read the comment docs.

gerashegalov commented 4 years ago

Add a description of flakiness, and how your fix addresses it