Closed astrojuanlu closed 6 months ago
Notice that this is starting to creep up as we do dynamic inclusion of requirements.txt
in projects pyproject.toml
. This happened to me yesterday after installing a wheel coming from a kedro package
:
❯ pip install ./spaceflights-namespaces/dist/spaceflights_namespaces-0.1-py3-none-any.whl (advkedro310)
Looking in indexes: https://juan_luis_cano%40mckinsey.com:****@mckinsey.jfrog.io/artifactory/api/pypi/python/simple
Processing ./spaceflights-namespaces/dist/spaceflights_namespaces-0.1-py3-none-any.whl
...
WARNING: kedro-datasets 1.3.0 does not provide the extra 'pandas-csvdataset'
WARNING: kedro-datasets 1.3.0 does not provide the extra 'pandas-exceldataset'
WARNING: kedro-datasets 1.3.0 does not provide the extra 'pandas-parquetdataset'
Contents of requirements.txt
:
kedro-datasets[pandas.CSVDataSet, pandas.ExcelDataSet, pandas.ParquetDataSet]~=1.0
Extras with illegal names are not being pulled anymore in pip 23.3 onwards https://github.com/kedro-org/kedro-plugins/issues/553
This was already marked as high, but now it's affecting users for real cc @noklam @ankatiyar
The workaround for now is to tell users to install, say, kedro-datasets[pandas]
instead of kedro-datasets[pandas.ParquetDataset]
, since the former is legal and the latter isn't. But still, this would pull lots of unneeded dependencies.
Sorry for the delayed response, and thank you for looking into this. Since you've been able to recreate the problem, it doesn't sound like you still need me to post the stacktrace. I've been using pipenv to manage my environments. I'm actually a bit confused right now ... I just created a new kedro project. Piopenv sees the requirements.txt file and uses it. Previously this didn't seem to install pyarrow or fastparquet, and pandas displayed a warning message that one of those two packages was going to be a non-optional requirement. But my current project did in fact install pyarrow.
Another user have a problem. At first he has a bug where fastparquet
is installed but ParquetDataset
wouldn't work with a "file is closed" error. This is a legit bug itself.
But pyarrow
should be the industry standard by now, he arrives this error because pip install kedor-datasets[optional]
is missing the dependencies, and the error suggest user to install fastparquet
or pyarrow
.
The key decision here is: do we still want to provide an optional group per-dataset?
If yes, we have to decide a naming scheme.
If no, that's a breaking change (kedro-datasets 3.0?) but also the current non-standard names don't work anyway. And besides, the dataset groups already have legal names and we don't need to change those (kedro-datasets[pandas]
etc).
From the Airflow 2.8.2 release notes: https://github.com/apache/airflow/releases/tag/2.8.2
Airflow extras now get extras normalized to
-
(following PEP-685) instead of_
and.
(as it was before in some extras). When you install airflow with such extras (for exampledbt.core
orall_dbs
) you should use-
instead of_
and.
.
https://github.com/apache/airflow/pull/36537
This is exactly our problem.
Fixed by #570, will see the light in the next release 👍🏽
The root cause of the problems we had last week (see #306) with the optional dependency group names in
pyproject.toml
is thatkedro-datasets
extras names just don't align with Python packaging standards: https://packaging.python.org/en/latest/specifications/core-metadata/#provides-extra-multiple-useTherefore,
pip install kedro-datasets[pandas.CSVDataSet]
is out of line with current packaging standards.From https://peps.python.org/pep-0685/:
So, why was this working before? See also in the PEP:
The reason is: setuptools hasn't fully implemented PEP 685 yet: https://github.com/pypa/setuptools/issues/3586
So, our mangling of extras in
setup.py
before https://github.com/kedro-org/kedro-plugins/pull/263 worked, but after transitioning to self-referential extras, we fell victims of name normalization.I'm voting against switching to Poetry, Hatch, PDM, or any other system that allows this behavior, because it's going to bite us in the future. We need a short-term solution (either revert #263 or go for @DimedS #307) and a long term solution (possibly deprecating extras names with dots and offer an alternative syntax for our users).
Originally posted by @astrojuanlu in https://github.com/kedro-org/kedro-plugins/issues/306#issuecomment-1680094495