mercedes-benz/automotive_feature_engineering

Table of Contents

About The Project
How to Clone the Source Code
Package Installation
How to Run Experiments
Contributing
License
Contact
Citation

This repository contains supplementary code for the paper "Proposal of an Automated Feature Engineering Pipeline for High-Dimensional Tabular Regression Data Using Reinforcement Learning". Author: Julian Müller julian.mueller@mercedes-benz.com, on behalf of MBition GmbH.

Provider Information

Source code has been tested solely for our own use cases, which might differ from yours. This project is actively maintained and contributing is endorsed.

About The Project

‘automotive_feature-engineering’ is a Python package designed to automate the feature engineering process for large in-car communication datasets within the automotive industry. It simplifies the transformation of raw data into meaningful input features for machine learning models, enhancing efficiency and reducing computational overhead. It supports both static analysis and dynamic feature engineering through reinforcement learning techniques.

(back to top)

How to Clone the Source Code

To clone the source code of this repository to your local machine, follow these steps:

Install Git: Make sure you have Git installed on your computer. If not, you can download it from git-scm.com.
Open a Terminal/Command Prompt: Navigate to the directory where you want to clone the repository.

Clone the Repository: Use the git clone command followed by the repository URL. Run the following command for HTTPS:

git clone https://github.com/mercedes-benz/automotive_feature_engineering.git

Or this one for SSH:

git clone git@github.com:mercedes-benz/automotive_feature_engineering.git

(back to top)

Package Installation

pip install dist/automotive_feature_engineering-0.1.0-py3-none-any.whl

(back to top)

How to run Experiments

Method List

Index	Method	Parameters	Description
0	``	-	Do nothing with features
1	`drop_correlated_features_09`	-	Drop highly correlated features with a correlation threshold of 0.9.
2	`drop_correlated_features_095`	-	Drop highly correlated features with a correlation threshold of 0.95.
3	`sns_handling_median_8`	-	Fill NaN values with the median for columns with more than 8 unique values.
4	`sns_handling_median_32`	-	Fill NaN values with the median for columns with more than 32 unique values.
5	`sns_handling_mean_8`	-	Fill NaN values with the mean for columns with more than 8 unique values.
6	`sns_handling_mean_32`	-	Fill NaN values with the mean for columns with more than 32 unique values.
7	`sns_handling_zero_8`	-	Fill NaN values with 0 for columns with more than 8 unique values.
8	`sns_handling_zero_32`	-	Fill NaN values with 0 for columns with more than 32 unique values.
9	`filter_by_variance`	-	Removes columns with variance below 0.1 across datasets.
10	`ohe`	-	Applies one-hot encoding to categorical variables in datasets.
11	`feature_importance_filter_00009999`	-	Filters out features from datasets that have an importance less than 0.00009999.
12	`feature_importance_filter_00049999`	-	Filters out features from datasets that have an importance less than 0.00049999.
13	`pca`	-	Applies Principal Component Analysis transformation to reduce dimensionality.
14	`polynominal_features`	-	Enhances feature set by creating polynomial terms.
99	`filter_by_variance_0`	-	Removes columns with only one unique value across datasets.

(back to top)

Documentation for Using the Static and Manual Method

Overview

The static and manual methods in the automotive_featureengineering package are designed to perform feature engineering on automotive data sets. The static method uses a predefined sequence of feature engineering steps, while the manual method allows users to specify their own sequence.

Initialization Parameters

Parameter	Type	Description	Default Value
df_train	`pd.DataFrame`	Training data.	Required
df_test	`pd.DataFrame`	Test data.	Required
model	`str`	Model to be used for feature selection. Options: `etree`, `randomforest`.	Required
target_names_list	`List[str]`	List of target names.	Required
import_joblib_path	`str`, optional	Path to import joblib file of previously exported feature engineering methods.	`None`
alt_docu_path	`str`, optional	Alternative documentation path.	`None`
alt_config	`Dict`, optional	Alternative configuration dictionary.	`None`
unrelated_cols	`List[str]`, optional	List of columns that are not considered in feature engineering.	`None`
model_export	`bool`	Whether to export the model.	`False`
fe_export_joblib	`bool`	Whether to export the feature engineering methods used.	`False`
explainable	`bool`	If set to True, a pipeline without polynomial features is used.	`False`

Step by Step Guide

Prepare Your Data

Prepare your training and testing datasets as pd.DataFrame.

Model Execution Methods

Call the Static Method

With your data frames ready, you can now call the static method. You need to specify additional parameters such as the model type and target features list according to your specific needs. The static method does not require a method list as it uses a predefined sequence of methods.

# Import function
from automotive_feature_engineering import static

# Execute the static method
results = static(df_train, df_test, model, target_names_list)

If no method list is provided, the default pipeline will be used.

Call the Manual Method

If you want to specify your own sequence of feature engineering steps, use the manual method. You need to provide a method list along with other parameters.

# Import function
from automotive_feature_engineering import manual

# Execute the manual method
results = manual(method_list, df_train, df_test, model, target_names_list)

(back to top)

Documentation for Using the RL Method

Overview

The RL method in the is designed to perform dynamic feature engineering on automotive data sets using reinforcement learning techniques. It processes input data frames to adaptively extract and engineer features that are essential for predictive modeling and further analysis.

Initialization Parameters

Parameter	Type	Description	Default Value
df_train	`pd.DataFrame`	Training data used in reinforcement learning.	Required
df_train_origin	`pd.DataFrame`	Train data.	Required
df_test_origin	`pd.DataFrame`	Test data.	Required
model	`str`	Model to be used for feature selection. Options: `etree`, `randomforest`.	Required
target_names_list	`List[str]`	List of target names.	Required
rl_raster	`float`	Sampling rate of input data.	Required
alt_docu	`str`, optional	Alternative documentation path.	`None`
alt_config	`Dict`, optional	Alternative configuration dictionary.	`None`
unrelated_cols	`List[str]`, optional	List of columns that are not considered in feature engineering.	`None`

# Import function
from automotive_feature_engineering import rl

# Execute the rl method
results = rl(df_train, df_train_origin, df_test_origin, target_names_list, model, rl_raster, unrelated_cols, alt_config, alt_docu)

For more examples, please refer to the Documentation

(back to top)

Contributing

The instructions on how to contribute can be found in the file CONTRIBUTING.md in this repository.

(back to top)

License

The code is published under the MIT license. Further information on that can be found in the LICENSE.md file in this repository.

(back to top)

Citation

@article{key2023, title={}, author={}, year={2023}, url={} }

(back to top)