salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 392 forks source link

Let user specify feature importance type for XGBoost #490

Closed TuanNguyen27 closed 4 years ago

TuanNguyen27 commented 4 years ago

XGBoost default feature importance is now gain instead of weight. Switching to gain will make XGBoost feature importance more consistent with Spark's Random Forest, which uses average of single tree importances across all trees in the ensemble.

Description of the available options for importanceType

‘gain’: the average gain across all splits the feature is used in.

‘cover’: the average coverage across all splits the feature is used in.

‘total_gain’: the total gain across all splits the feature is used in.

‘total_cover’: the total coverage across all splits the feature is used in.
codecov[bot] commented 4 years ago

Codecov Report

Merging #490 into master will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master     #490    +/-   ##
========================================
  Coverage   86.97%   86.97%            
========================================
  Files         345      345            
  Lines       11684    11684            
  Branches      379      611   +232     
========================================
  Hits        10162    10162            
  Misses       1522     1522            
Impacted Files Coverage Δ
.../ml/dmlc/xgboost4j/scala/spark/XGBoostParams.scala 95.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9857138...1e4f8ee. Read the comment docs.