kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.94k stars 903 forks source link

Development environment / tests not working for master when run locally #1577

Closed daniel-falk closed 2 years ago

daniel-falk commented 2 years ago

Description

Hi, I'm trying to get the development environment up and running on my Linux machine, but I have some issues with both the tests and the linter.

Steps to Reproduce

  1. Setup an virtual environment with python 3.10.0
  2. As per the guidelines for contribution, run:
    make install-test-requirements
    make install-pre-commit
    make test
  3. This results in 8 failed tests and 17 errors:
    ================================================================== short test summary info ===================================================================
    SKIPPED [1] tests/extras/datasets/holoviews/test_holoviews_writer.py:42: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    SKIPPED [1] tests/extras/datasets/holoviews/test_holoviews_writer.py:54: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    SKIPPED [1] tests/extras/datasets/holoviews/test_holoviews_writer.py:70: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    SKIPPED [1] tests/extras/datasets/holoviews/test_holoviews_writer.py:75: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    SKIPPED [1] tests/extras/datasets/holoviews/test_holoviews_writer.py:80: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    SKIPPED [1] tests/extras/datasets/holoviews/test_holoviews_writer.py:89: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    SKIPPED [6] tests/extras/datasets/holoviews/test_holoviews_writer.py:95: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    SKIPPED [1] tests/extras/datasets/holoviews/test_holoviews_writer.py:125: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    SKIPPED [1] tests/extras/datasets/holoviews/test_holoviews_writer.py:140: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    SKIPPED [1] tests/extras/datasets/holoviews/test_holoviews_writer.py:151: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    SKIPPED [1] tests/extras/datasets/holoviews/test_holoviews_writer.py:169: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    SKIPPED [1] tests/extras/datasets/holoviews/test_holoviews_writer.py:177: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    SKIPPED [1] tests/extras/datasets/holoviews/test_holoviews_writer.py:183: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    SKIPPED [1] tests/extras/datasets/holoviews/test_holoviews_writer.py:189: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    SKIPPED [1] tests/extras/datasets/holoviews/test_holoviews_writer.py:200: Python 3.10 needs matplotlib>=3.5 which breaks holoviews.
    ERROR tests/extras/datasets/dask/test_parquet_dataset.py::TestParquetDataSet::test_save_data[None-None] - botocore.exceptions.ClientError: An error occurre...
    ERROR tests/extras/datasets/dask/test_parquet_dataset.py::TestParquetDataSet::test_load_data[None-None] - botocore.exceptions.ClientError: An error occurre...
    ERROR tests/extras/datasets/dask/test_parquet_dataset.py::TestParquetDataSet::test_exists[None-None] - botocore.exceptions.ClientError: An error occurred (...
    ERROR tests/extras/datasets/matplotlib/test_matplotlib_writer.py::TestMatplotlibWriter::test_save_data[None-False-save_args0] - botocore.exceptions.ClientE...
    ERROR tests/extras/datasets/matplotlib/test_matplotlib_writer.py::TestMatplotlibWriter::test_list_save[None-None-False] - botocore.exceptions.ClientError: ...
    ERROR tests/extras/datasets/matplotlib/test_matplotlib_writer.py::TestMatplotlibWriter::test_dict_save[None-None-False] - botocore.exceptions.ClientError: ...
    ERROR tests/extras/datasets/matplotlib/test_matplotlib_writer.py::TestMatplotlibWriter::test_overwrite[None-None-False-8] - botocore.exceptions.ClientError...
    ERROR tests/extras/datasets/matplotlib/test_matplotlib_writer.py::TestMatplotlibWriter::test_overwrite[None-None-True-3] - botocore.exceptions.ClientError:...
    ERROR tests/extras/datasets/matplotlib/test_matplotlib_writer.py::TestMatplotlibWriter::test_fs_args - botocore.exceptions.ClientError: An error occurred (...
    ERROR tests/extras/datasets/matplotlib/test_matplotlib_writer.py::TestMatplotlibWriter::test_open_extra_args[None-False-fs_args0] - botocore.exceptions.Cli...
    ERROR tests/extras/datasets/matplotlib/test_matplotlib_writer.py::TestMatplotlibWriter::test_load_fail[None-None-False] - botocore.exceptions.ClientError: ...
    ERROR tests/extras/datasets/matplotlib/test_matplotlib_writer.py::TestMatplotlibWriter::test_exists_single[None-None-False] - botocore.exceptions.ClientErr...
    ERROR tests/extras/datasets/matplotlib/test_matplotlib_writer.py::TestMatplotlibWriter::test_exists_multiple[None-None-False] - botocore.exceptions.ClientE...
    ERROR tests/extras/datasets/matplotlib/test_matplotlib_writer.py::TestMatplotlibWriterVersioned::test_versioning_existing_dataset_single_plot[None-None-False-None-None]
    ERROR tests/extras/datasets/matplotlib/test_matplotlib_writer.py::TestMatplotlibWriterVersioned::test_versioning_existing_dataset_list_plot[None-None-False-None-None]
    ERROR tests/extras/datasets/matplotlib/test_matplotlib_writer.py::TestMatplotlibWriterVersioned::test_versioning_existing_dataset_dict_plot[None-None-False-None-None]
    ERROR tests/extras/datasets/spark/test_spark_dataset.py::TestSparkDataSet::test_load_options_schema_path_with_credentials - botocore.exceptions.ClientError...
    FAILED tests/io/test_incremental_dataset.py::TestPartitionedDataSetS3::test_load_and_confirm - AssertionError: assert dict_keys([]) == dict_keys(['p...04/d...
    FAILED tests/io/test_incremental_dataset.py::TestPartitionedDataSetS3::test_load_and_confirm_s3a - AssertionError: assert dict_keys([]) == dict_keys(['p......
    FAILED tests/io/test_incremental_dataset.py::TestPartitionedDataSetS3::test_force_checkpoint_no_checkpoint_file[-expected_partitions0] - AssertionError: as...
    FAILED tests/io/test_incremental_dataset.py::TestPartitionedDataSetS3::test_force_checkpoint_no_checkpoint_file[p00/data.csv-expected_partitions1] - Assert...
    FAILED tests/io/test_incremental_dataset.py::TestPartitionedDataSetS3::test_force_checkpoint_no_checkpoint_file[p03/data.csv-expected_partitions2] - Assert...
    FAILED tests/io/test_incremental_dataset.py::TestPartitionedDataSetS3::test_force_checkpoint_checkpoint_file_exists[-expected_partitions0] - kedro.io.core....
    FAILED tests/io/test_incremental_dataset.py::TestPartitionedDataSetS3::test_force_checkpoint_checkpoint_file_exists[p00/data.csv-expected_partitions1] - ke...
    FAILED tests/io/test_incremental_dataset.py::TestPartitionedDataSetS3::test_force_checkpoint_checkpoint_file_exists[p03/data.csv-expected_partitions2] - ke...
    ======================================= 8 failed, 1920 passed, 20 skipped, 8 warnings, 17 errors in 138.84s (0:02:18) ========================================

Running the make lint target on the master branch does result in a diff in two files and therefore failing:

pre-commit run -a --hook-stage manual 
Trim Trailing Whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

Fixing docs/source/get_started/example_project.md
Fixing docs/source/tutorial/visualise_pipeline.md

Fix End of Files.........................................................Passed
Check Yaml...............................................................Passed
Check JSON...............................................................Passed
Check for added large files..............................................Passed
Check for case conflicts.................................................Passed
Check for merge conflicts................................................Passed
Debug Statements (Python)................................................Passed
Fix requirements.txt.....................................................Passed
Flake8...................................................................Passed
mypy.....................................................................Passed
blacken-docs.............................................................Passed
pyupgrade................................................................Passed
Sort imports.............................................................Passed
Black....................................................................Passed
Import Linter............................................................Passed
Secret scan..............................................................Passed
Bandit security check....................................................Passed
Pylint on kedro/*........................................................Passed
Pylint on features/*.....................................................Passed
Pylint on tests/*........................................................Passed
diff --git a/docs/source/get_started/example_project.md b/docs/source/get_started/example_project.md
index 8aec3560..ba7ce2dc 100644
--- a/docs/source/get_started/example_project.md
+++ b/docs/source/get_started/example_project.md
@@ -121,6 +121,6 @@ These are the node function within `src/get_started/nodes.py`:
 | Split data      | Splits the example [Iris dataset](https://www.kaggle.com/uciml/iris) into train and test samples | `split_data`       |
 | Make Predictions| Makes class predictions using 1-nearest neighbour classifier and train-test set                  | `make_predictions` |
 | Report accuracy | Reports the accuracy of the predictions performed by the previous node                           | `report_accuracy`  |
- 
+

 The file `src/pipeline_registry.py` creates and collates into a single pipeline, resolving node execution order from the input and output data dependencies between the nodes.
diff --git a/docs/source/tutorial/visualise_pipeline.md b/docs/source/tutorial/visualise_pipeline.md
index ee12a2b0..f57ed20d 100644
--- a/docs/source/tutorial/visualise_pipeline.md
+++ b/docs/source/tutorial/visualise_pipeline.md
@@ -113,7 +113,7 @@ You need to update `requirements.txt` in your Kedro project and add the followin

 You can view Plotly charts in Kedro-Viz when you use Kedro's plotly datasets.

-There are two types of Plotly datasets in Kedro, the `plotly.PlotlyDataSet` and `plotly.JSONDataSet`. 
+There are two types of Plotly datasets in Kedro, the `plotly.PlotlyDataSet` and `plotly.JSONDataSet`.
 ### [`plotly.PlotlyDataSet`](https://kedro.readthedocs.io/en/stable/kedro.extras.datasets.plotly.PlotlyDataSet.html#kedro.extras.datasets.plotly.PlotlyDataSet)

 To use this dataset you need to configure your plot in the `catalog.yml`. This dataset only supports [Plotly Express](https://plotly.com/python/plotly-express).

Your Environment

pyenv 2.2.3-4-g971397dd Python 3.10.0

-e git+ssh://git@github.com/kedro-org/kedro.git@2bd2ee27bea25648e60495229a2ef6efaa59a4e2#egg=kedro

Ubuntu 20.04.3 LTS Linux daniel-desktop 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

daniel-falk commented 2 years ago

Here is the full output from the test: kedro-test.txt

noklam commented 2 years ago

Could you check what's the version of the library moto? It is used for mocking the s3 storage for testing.

daniel-falk commented 2 years ago

Thanks for the quick answer!

Here's the installed version of moto:

moto==3.0.4

Here's the full pip freeze: kedro-freeze.txt

noklam commented 2 years ago

commit: b1710fa933817a303937f545982b1d75a18de505

I have my matplotlib tests all passed with Python 3.10.0. @daniel-falk

Package                       Version     Location
----------------------------- ----------- --------------------------------
absl-py                       1.0.0
adal                          1.2.7
adlfs                         2022.2.0
aiohttp                       3.8.1
aiosignal                     1.2.0
anyconfig                     0.10.1
anyio                         3.6.1
appnope                       0.1.3
argon2-cffi                   21.3.0
argon2-cffi-bindings          21.2.0
arrow                         1.2.2
aspy.yaml                     1.3.0
astroid                       2.11.5
astunparse                    1.6.3
async-timeout                 4.0.2
attrs                         21.4.0
azure-core                    1.24.0
azure-datalake-store          0.0.52
azure-identity                1.10.0
azure-storage-blob            12.12.0
Babel                         2.10.1
backcall                      0.2.0
bandit                        1.7.4
beautifulsoup4                4.11.1
behave                        1.2.6
binaryornot                   0.4.4
biopython                     1.79
black                         22.3.0
blacken-docs                  1.9.2
bleach                        5.0.0
bokeh                         2.4.3
boto3                         1.23.10
botocore                      1.26.10
cachetools                    4.2.4
certifi                       2022.5.18.1
cffi                          1.15.0
cfgv                          3.3.1
chardet                       4.0.0
charset-normalizer            2.0.12
click                         8.1.3
click-plugins                 1.1.1
cligj                         0.7.2
cloudpickle                   2.1.0
compress-pickle               1.2.0
cookiecutter                  1.7.3
coverage                      6.4
cryptography                  37.0.2
cycler                        0.11.0
dask                          2021.12.0
db-dtypes                     1.0.1
debugpy                       1.6.0
decorator                     5.1.1
defusedxml                    0.7.1
delta-spark                   1.2.1
Deprecated                    1.2.13
dill                          0.3.5.1
distlib                       0.3.4
distributed                   2021.12.0
docopt                        0.6.2
dynaconf                      3.1.8
entrypoints                   0.4
et-xmlfile                    1.1.0
execnet                       1.9.0
fastjsonschema                2.15.3
filelock                      3.7.0
Fiona                         1.8.21
flatbuffers                   1.12
fonttools                     4.33.3
frozenlist                    1.3.0
fsspec                        2022.1.0
gast                          0.4.0
gcsfs                         2022.1.0
geopandas                     0.10.2
gitdb                         4.0.9
gitdb2                        4.0.2
GitPython                     3.0.6
google-api-core               2.8.1
google-auth                   2.6.6
google-auth-oauthlib          0.4.6
google-cloud-bigquery         3.1.0
google-cloud-bigquery-storage 2.13.1
google-cloud-core             2.3.0
google-cloud-storage          2.3.0
google-crc32c                 1.3.0
google-pasta                  0.2.0
google-resumable-media        2.3.3
googleapis-common-protos      1.56.2
greenlet                      1.1.2
grimp                         1.2.3
grpcio                        1.46.3
grpcio-status                 1.46.3
h5py                          3.7.0
hdfs                          2.7.0
HeapDict                      1.0.1
holoviews                     1.13.5
identify                      2.5.1
idna                          3.3
import-linter                 1.2.6
importlib-metadata            4.11.4
iniconfig                     1.1.1
ipykernel                     6.13.0
ipython                       7.34.0
ipython-genutils              0.2.0
ipywidgets                    7.7.0
isodate                       0.6.1
isort                         5.10.1
jedi                          0.18.1
Jinja2                        3.0.3
jinja2-time                   0.2.0
jmespath                      0.10.0
joblib                        1.1.0
json5                         0.9.8
jsonschema                    4.5.1
jupyter                       1.0.0
jupyter-client                7.3.1
jupyter-console               6.4.3
jupyter-core                  4.10.0
jupyter-server                1.17.0
jupyterlab                    3.4.2
jupyterlab-pygments           0.2.2
jupyterlab-server             2.14.0
jupyterlab-widgets            1.1.0
kedro                         0.18.1      /Users/Nok_Lam_Chan/GitHub/kedro
keras                         2.9.0
Keras-Preprocessing           1.1.2
kiwisolver                    1.4.2
lazy-object-proxy             1.7.1
libclang                      14.0.1
locket                        1.0.0
lxml                          4.8.0
lz4                           4.0.1
Markdown                      3.3.7
MarkupSafe                    2.1.1
matplotlib                    3.5.2
matplotlib-inline             0.1.3
mccabe                        0.7.0
memory-profiler               0.60.0
mistune                       0.8.4
moto                          3.0.4
msal                          1.17.0
msal-extensions               1.0.0
msgpack                       1.0.3
msrest                        0.6.21
multidict                     6.0.2
munch                         2.5.0
mypy-extensions               0.4.3
nbclassic                     0.3.7
nbclient                      0.6.3
nbconvert                     6.5.0
nbformat                      5.4.0
nest-asyncio                  1.5.5
networkx                      2.8.2
nodeenv                       1.6.0
notebook                      6.4.11
notebook-shim                 0.1.0
numexpr                       2.8.1
numpy                         1.22.4
oauthlib                      3.2.0
openpyxl                      3.0.10
opt-einsum                    3.3.0
packaging                     21.3
pandas                        1.4.2
pandas-gbq                    0.17.5
pandocfilters                 1.5.0
panel                         0.13.1
param                         1.12.1
parse                         1.19.0
parse-type                    0.6.0
parso                         0.8.3
partd                         1.2.0
pathspec                      0.9.0
pbr                           5.9.0
pep517                        0.12.0
pexpect                       4.8.0
pickleshare                   0.7.5
Pillow                        9.1.1
pip                           21.2.4
pip-tools                     6.6.2
platformdirs                  2.5.2
plotly                        5.8.0
pluggy                        1.0.0
portalocker                   2.4.0
poyo                          0.5.0
pre-commit                    1.21.0
prometheus-client             0.14.1
prompt-toolkit                3.0.29
proto-plus                    1.20.5
protobuf                      3.19.4
psutil                        5.8.0
ptyprocess                    0.7.0
py                            1.11.0
py4j                          0.10.9.3
pyarrow                       6.0.1
pyasn1                        0.4.8
pyasn1-modules                0.2.8
pycparser                     2.21
pyct                          0.4.8
pydata-google-auth            1.4.0
Pygments                      2.12.0
PyJWT                         2.4.0
pylint                        2.13.9
pyparsing                     3.0.9
pyproj                        3.3.1
pyrsistent                    0.18.1
pyspark                       3.2.1
pytest                        6.2.5
pytest-cov                    3.0.0
pytest-forked                 1.4.0
pytest-mock                   1.13.0
pytest-xdist                  2.2.1
python-dateutil               2.8.2
python-json-logger            2.0.2
python-slugify                6.1.2
pytz                          2022.1
pyviz-comms                   2.2.0
PyYAML                        6.0
pyzmq                         23.0.0
qtconsole                     5.3.0
QtPy                          2.1.0
redis                         4.3.1
requests                      2.27.1
requests-mock                 1.9.3
requests-oauthlib             1.3.1
responses                     0.21.0
rope                          0.21.1
rsa                           4.8
s3fs                          0.4.2
s3transfer                    0.5.2
Send2Trash                    1.8.0
setuptools                    61.2.0
Shapely                       1.8.2
six                           1.16.0
smmap                         5.0.0
sniffio                       1.2.0
sortedcontainers              2.4.0
soupsieve                     2.3.2.post1
SQLAlchemy                    1.4.36
stevedore                     3.5.0
tables                        3.7.0
tblib                         1.7.0
tenacity                      8.0.1
tensorboard                   2.9.0
tensorboard-data-server       0.6.1
tensorboard-plugin-wit        1.8.1
tensorflow                    2.9.1
tensorflow-estimator          2.9.0
tensorflow-io-gcs-filesystem  0.26.0
termcolor                     1.1.0
terminado                     0.15.0
test-plugin                   0.1
test-plugin333                0.3
text-unidecode                1.3
tinycss2                      1.1.1
toml                          0.10.2
tomli                         2.0.1
toolz                         0.11.2
toposort                      1.7
tornado                       6.1
tqdm                          4.64.0
traitlets                     5.2.1.post0
truffleHog                    2.2.1
truffleHogRegexes             0.0.7
typing_extensions             4.2.0
urllib3                       1.26.9
virtualenv                    20.14.1
wcwidth                       0.2.5
webencodings                  0.5.1
websocket-client              1.3.2
Werkzeug                      2.1.2
wheel                         0.37.1
widgetsnbextension            3.6.0
wrapt                         1.14.1
XlsxWriter                    1.4.5
xmltodict                     0.13.0
yarl                          1.7.2
zict                          2.2.0
zipp                          3.8.0
daniel-falk commented 2 years ago

I managed to get it working by removing my ~/.aws/config file. It seems like the issue stems from setting a region for the default profile (in my case eu-north-1). Do you have anything in your aws config?

Perhaps I should raise this issue in the moto repo instead.

noklam commented 2 years ago

I don't have one, it's probably not a bug though as this is how aws config works.

noklam commented 2 years ago

@daniel-falk I am closing it now and looking forward to your PR. :)