stefan-grafberger / mlwhatif

Data-Centric What-If Analysis for Native Machine Learning Pipelines
Apache License 2.0
14 stars 4 forks source link

mlwhatif

mlwhatif GitHub license Build Status codecov

Data-Centric What-If Analysis for Native Machine Learning Pipelines.

This project uses the mlinspect project as a foundation, mainly for its plan extraction from native ML pipelines.

Run mlwhatif locally

Prerequisite: Python 3.9

  1. Clone this repository (optionally, with Git LFS, to also download the datasets for the scalability experiment)

  2. Set up the environment

    cd mlwhatif
    python -m venv venv
    source venv/bin/activate

  3. If you want to use the visualisation functions we provide, install graphviz which can not be installed via pip

    Linux: apt-get install graphviz
    MAC OS: brew install graphviz

  4. Install pip dependencies

    SETUPTOOLS_USE_DISTUTILS=stdlib pip install -e ."[dev]"

  5. To ensure everything works, you can run the tests (without graphviz, the visualisation test will fail)

    python setup.py test

How to use mlwhatif

mlwhatif makes it easy to analyze your pipeline and automatically run what-if analyses.

from mlwhatif import PipelineAnalyzer
from mlwhatif.analysis import DataCleaning, ErrorType

IPYNB_PATH = ...
cleanlearn = DataCleaning({'category': ErrorType.CAT_MISSING_VALUES,
                           'vine': ErrorType.CAT_MISSING_VALUES,
                           'star_rating': ErrorType.NUM_MISSING_VALUES,
                           'total_votes': ErrorType.OUTLIERS,
                           'review_id': ErrorType.DUPLICATES,
                           None: ErrorType.MISLABEL
                         })

analysis_result = PipelineAnalyzer \
    .on_pipeline_from_ipynb_file(IPYNB_PATH)\
    .add_what_if_analysis(cleanlearn) \
    .execute()

cleanlearn_report = analysis_result.analysis_to_result_reports[cleanlearn]

Detailed Example

We prepared a demo notebook to showcase mlwhatif and its features.

Notes

Publications

License

This library is licensed under the Apache 2.0 License.