mlflow / mlflow-export-import

Apache License 2.0
132 stars 78 forks source link

Use pre-commit to standardize code formatting #31

Open juftin opened 2 years ago

juftin commented 2 years ago

Request Summary

I would like to implement a tool like pre-commit to handle auto-code formatting and quality checks. This would be very helpful for onboarding new contributors.

Let me know if this is of interest I'm happy to help implement it.

As a contributor I would like:

Implementation Details

Step 1)

Add a new .pre-commit-config.yaml file at the root of the repo (I'll explain below what this does)

exclude: docs|.git|.tox
default_stages: [commit]
fail_fast: false

repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.3.0
    hooks:
    -   id: trailing-whitespace
    -   id: end-of-file-fixer
    -   id: check-yaml
    -   id: check-ast
    -   id: check-docstring-first
    -   id: check-merge-conflict
    -   id: mixed-line-ending

-   repo: https://github.com/timothycrosley/isort
    rev: 5.10.1
    hooks:
    -   id: isort
        args: [--profile, black]

-   repo: https://github.com/psf/black
    rev: 22.6.0
    hooks:
    -   id: black-jupyter

-   repo: https://github.com/macisamuele/language-formatters-pre-commit-hooks
    rev: v2.4.0
    hooks:
    -   id: pretty-format-yaml
        args: [--autofix, --indent, '4']
    -   id: pretty-format-ini
        args: [--autofix]
    -   id: pretty-format-toml
        args: [--autofix]

# sets up .pre-commit-ci.yaml to ensure pre-commit dependencies stay up to date
ci:
    autoupdate_schedule: weekly
    skip: []
    submodules: false

The above config file sets up a number of tools to run automatically on edited files when the git commit action is performed:

Step 2)

Commit the above file and install pre-commit:

pip install pre-commit
pre-commit install
pre-commit autoupdate

Run a onetime code-cleanup of everything

pre-commit run --all-files

Step 3)

Push all of these changes up into GitHub. This can be a painful part of implementing a tool like pre-commit since there will be a massive diff - I recommend the original maintainer be the one to push those changes to retain git blame history.

Step 4)

Add some details for new contributors. I have an example here I try to re-use across GitHub: https://juftin.com/camply/contributing.html

Step 5)

Nothing, new contributors code will auto-format during commit and they'll learn an awesome tool while they're at it

amesar commented 2 years ago

Sounds interesting, let me check it out. Only issue might be that Databricks notebooks are saved as ".py" files, so we'd have to exclude them somehow.

amesar commented 2 years ago

We might also want to check out how the core mlflow repo handles code formatting so as to be compatible with them.