rapidsai / deployment

RAPIDS Deployment Documentation
https://docs.rapids.ai/deployment/stable/
9 stars 28 forks source link

[ci] use `nbqa` or `ruff` to auto-format and lint notebooks? #333

Closed jameslamb closed 4 months ago

jameslamb commented 7 months ago

The nbqa project (nbQA-dev/nbQA) understands the Ipython notebook format and knows how to run common Python static analysis and autoformatting tools over them.

For example:

conda create --name deployment-dev \
    python=3.10 \
    nbqa \
    flake8

source activate deployment-dev
nbqa flake8 \
   . \
   --extend-ignore 'E402,E501'

As of this writing, that finds some legitimate correctness and efficiency issues:

source/examples/rapids-1brc-single-node/notebook.ipynb:cell_3:8:17: F541 f-string is missing placeholders
source/examples/rapids-autoscaling-multi-tenant-kubernetes/notebook.ipynb:cell_23:7:5: F401 'dask.distributed.wait' imported but unused
source/examples/rapids-azureml-hpo/notebook.ipynb:cell_7:5:29: F821 undefined name 'os'
source/examples/rapids-azureml-hpo/notebook.ipynb:cell_8:7:10: F821 undefined name 'os'
source/examples/rapids-azureml-hpo/notebook.ipynb:cell_9:1:1: F401 'azure.ai.ml.sweep.MedianStoppingPolicy' imported but unused
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell_1:3:17: F821 undefined name 'cluster'
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell_2:1:1: F401 'math' imported but unused
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell_2:2:1: F401 'datetime.datetime' imported but unused
...

It supports a bunch of tools, like this:

nbqa black .
nbqa isort .
nbqa flake8 .

See https://github.com/nbQA-dev/nbQA?tab=readme-ov-file#-examples.

Other nice features of that tool:

Benefits of this work

jacobtomlinson commented 7 months ago

This would be a great addition! Perhaps we could add this to the pre-commit hooks?

jameslamb commented 7 months ago

If you all are open to it, sure! I'd be happy to contribute that.

I did notice pre-commit is run in CI here: https://github.com/rapidsai/deployment/blob/main/.github/workflows/pre-commit.yml

jacobtomlinson commented 7 months ago

Yes please that would be great!

bdice commented 7 months ago

Feel free to copy from cudf — it uses nbqa in pre-commit.

jameslamb commented 7 months ago

excellent, thanks!

jameslamb commented 4 months ago

Working through these in pieces, but thought it'd be useful to share how much is left to go.

Here's what remains to enable ruff on this project's notebooks.

error: Failed to parse source/examples/xgboost-azure-mnmg-daskcloudprovider/notebook.ipynb:23:6:27: Unexpected token '<'
error: Failed to parse source/examples/rapids-sagemaker-higgs/notebook.ipynb:39:1:5: Unexpected token 'ecr'
source/examples/rapids-1brc-single-node/notebook.ipynb:cell 7:1:1: I001 [*] Import block is un-sorted or un-formatted
source/examples/rapids-1brc-single-node/notebook.ipynb:cell 10:8:17: F541 [*] f-string without any placeholders
source/examples/rapids-1brc-single-node/notebook.ipynb:cell 12:13:121: E501 Line too long (129 > 120)
source/examples/rapids-autoscaling-multi-tenant-kubernetes/notebook.ipynb:cell 43:2:1: E402 Module level import not at top of cell
source/examples/rapids-autoscaling-multi-tenant-kubernetes/notebook.ipynb:cell 43:3:1: E402 Module level import not at top of cell
source/examples/rapids-autoscaling-multi-tenant-kubernetes/notebook.ipynb:cell 51:6:1: I001 [*] Import block is un-sorted or un-formatted
source/examples/rapids-autoscaling-multi-tenant-kubernetes/notebook.ipynb:cell 51:7:42: F401 [*] `dask.distributed.wait` imported but unused
source/examples/rapids-azureml-hpo/notebook.ipynb:cell 6:1:1: I001 [*] Import block is un-sorted or un-formatted
source/examples/rapids-azureml-hpo/notebook.ipynb:cell 13:10:1: E722 Do not use bare `except`
source/examples/rapids-azureml-hpo/notebook.ipynb:cell 23:2:1: I001 [*] Import block is un-sorted or un-formatted
source/examples/rapids-azureml-hpo/notebook.ipynb:cell 23:5:29: F821 Undefined name `os`
source/examples/rapids-azureml-hpo/notebook.ipynb:cell 26:1:1: I001 [*] Import block is un-sorted or un-formatted
source/examples/rapids-azureml-hpo/notebook.ipynb:cell 26:7:10: F821 Undefined name `os`
source/examples/rapids-azureml-hpo/notebook.ipynb:cell 26:17:121: E501 Line too long (158 > 120)
source/examples/rapids-azureml-hpo/notebook.ipynb:cell 26:18:121: E501 Line too long (142 > 120)
source/examples/rapids-azureml-hpo/notebook.ipynb:cell 31:1:1: I001 [*] Import block is un-sorted or un-formatted
source/examples/rapids-azureml-hpo/notebook.ipynb:cell 31:1:48: F401 [*] `azure.ai.ml.sweep.MedianStoppingPolicy` imported but unused
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell 4:3:17: F821 Undefined name `cluster`
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell 7:1:1: I001 [*] Import block is un-sorted or un-formatted
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell 7:1:8: F401 [*] `math` imported but unused
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell 7:2:22: F401 [*] `datetime.datetime` imported but unused
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell 7:4:8: F401 [*] `cudf` imported but unused
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell 7:5:8: F401 [*] `dask` imported but unused
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell 7:13:22: F401 [*] `dateutil.parser` imported but unused
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell 7:14:1: E401 [*] Multiple imports on one line
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell 7:14:8: F401 [*] `configparser` imported but unused
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell 7:14:22: F401 [*] `os` imported but unused
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell 7:14:26: F401 [*] `contextlib` imported but unused
source/examples/rapids-ec2-mnmg/notebook.ipynb:cell 27:3:1: F821 Undefined name `cluster`
source/examples/rapids-optuna-hpo/notebook.ipynb:cell 3:1:1: I001 [*] Import block is un-sorted or un-formatted
source/examples/rapids-optuna-hpo/notebook.ipynb:cell 3:2:8: F401 [*] `cuml` imported but unused
source/examples/rapids-optuna-hpo/notebook.ipynb:cell 3:3:8: F401 [*] `dask_cudf` imported but unused
source/examples/rapids-optuna-hpo/notebook.ipynb:cell 3:4:17: F401 [*] `numpy` imported but unused
source/examples/rapids-optuna-hpo/notebook.ipynb:cell 3:7:8: F401 [*] `dask` imported but unused
source/examples/rapids-optuna-hpo/notebook.ipynb:cell 3:14:44: F401 [*] `dask.distributed.performance_report` imported but unused
source/examples/rapids-optuna-hpo/notebook.ipynb:cell 7:1:8: F811 [*] Redefinition of unused `os` from cell 3, line 6
source/examples/rapids-optuna-hpo/notebook.ipynb:cell 9:1:18: F401 [*] `pandas` imported but unused
source/examples/rapids-sagemaker-higgs/notebook.ipynb:cell 39:1:5: E999 SyntaxError: Unexpected token 'ecr'
source/examples/time-series-forecasting-with-hpo/notebook.ipynb:cell 23:1:1: I001 [*] Import block is un-sorted or un-formatted
source/examples/time-series-forecasting-with-hpo/notebook.ipynb:cell 23:13:12: UP031 Use format specifiers instead of percent format
source/examples/time-series-forecasting-with-hpo/notebook.ipynb:cell 23:18:9: UP032 [*] Use f-string instead of `format` call
source/examples/time-series-forecasting-with-hpo/notebook.ipynb:cell 90:11:13: E741 Ambiguous variable name: `l`
source/examples/time-series-forecasting-with-hpo/notebook.ipynb:cell 103:12:26: F541 [*] f-string without any placeholders
source/examples/time-series-forecasting-with-hpo/notebook.ipynb:cell 109:1:1: I001 [*] Import block is un-sorted or un-formatted
source/examples/time-series-forecasting-with-hpo/notebook.ipynb:cell 120:46:5: F841 Local variable `t` is assigned to but never used
source/examples/time-series-forecasting-with-hpo/notebook.ipynb:cell 143:4:9: F841 Local variable `product_weights` is assigned to but never used
source/examples/xgboost-azure-mnmg-daskcloudprovider/notebook.ipynb:cell 23:6:27: E999 SyntaxError: Unexpected token '<'
source/examples/xgboost-azure-mnmg-daskcloudprovider/notebook.ipynb:cell 23:21:121: E501 Line too long (123 > 120)
source/examples/xgboost-azure-mnmg-daskcloudprovider/notebook.ipynb:cell 30:1:121: E501 Line too long (146 > 120)
source/examples/xgboost-dask-databricks/notebook.ipynb:cell 4:1:1: I001 [*] Import block is un-sorted or un-formatted
source/examples/xgboost-dask-databricks/notebook.ipynb:cell 4:2:18: F401 [*] `time.time` imported but unused
source/examples/xgboost-dask-databricks/notebook.ipynb:cell 4:5:18: F401 [*] `pandas` imported but unused
source/examples/xgboost-dask-databricks/notebook.ipynb:cell 4:7:8: F401 [*] `cupy` imported but unused
source/examples/xgboost-dask-databricks/notebook.ipynb:cell 4:8:8: F401 [*] `cudf` imported but unused
source/examples/xgboost-dask-databricks/notebook.ipynb:cell 4:9:8: F401 [*] `dask` imported but unused
source/examples/xgboost-dask-databricks/notebook.ipynb:cell 4:10:26: F401 [*] `dask.dataframe` imported but unused
source/examples/xgboost-dask-databricks/notebook.ipynb:cell 8:14:121: E501 Line too long (128 > 120)
source/examples/xgboost-dask-databricks/notebook.ipynb:cell 12:4:4: F821 Undefined name `spark`
source/examples/xgboost-dask-databricks/notebook.ipynb:cell 12:10:12: F821 Undefined name `spark`
source/examples/xgboost-dask-databricks/notebook.ipynb:cell 13:1:9: F821 Undefined name `spark`
source/examples/xgboost-gpu-hpo-job-parallel-k8s/notebook.ipynb:cell 20:1:1: I001 [*] Import block is un-sorted or un-formatted
source/examples/xgboost-gpu-hpo-job-parallel-ngc/notebook.ipynb:cell 14:1:1: I001 [*] Import block is un-sorted or un-formatted
source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb:cell 3:1:1: I001 [*] Import block is un-sorted or un-formatted
source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb:cell 3:4:8: F401 [*] `cuml` imported but unused
source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb:cell 3:10:38: F401 [*] `dask.distributed.wait` imported but unused
source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb:cell 3:13:21: F401 [*] `sklearn.datasets` imported but unused
source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb:cell 3:21:8: F401 [*] `gzip` imported but unused
source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb:cell 11:1:8: F811 [*] Redefinition of unused `time` from cell 3, line 1
source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb:cell 19:27:9: UP032 [*] Use f-string instead of `format` call
source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb:cell 20:8:11: UP032 [*] Use f-string instead of `format` call
Found 70 errors.

Got those by modifying the ruff hook like this in .pre-commit-config.yaml.

- id: ruff
  types_or: [jupyter, python]
jameslamb commented 4 months ago

🎉 woo! Thanks for all the reviews @jacobtomlinson