Closed marenwestermann closed 1 year ago
ping @noatamir
I just removed my company's pypi from the list of indexes but the errors in the installation process remain the same. After removing my company's pypi I ran from my pandas-dev conda environment:
make clean
python setup.py build_ext -j 4
python -m pip install -e . --no-build-isolation --no-use-pep517
Hi, thanks for your report.
Can you post a couple of failures? There is definitely something off with yout installation, the test suite is passing in general, except one that is currently failing due to a dependency update
The cython extensions build correctly?
What's the commit you are on?
Hey
Regarding the version, I get
>>> pandas.__version__
'1.5.0.dev0+1277.g08fd9c0c0c'
which still looks like the old pattern (in the docs)
Could you also show the output of git describe
please? Did you do a full clone of the pandas
repo?
Hi π @marenwestermann
If we want to mention potentially failing test, I would word this in a way that makes it clear that new releases might cause this (so that people know what to check if they want to debug)
Regarding the "0+untagged.29914.g08fd9c0" version number, would it be possible that you cloned the pandas git repo with a limited "depth"? Since our version numbers are based on git tags, to have a proper number you need at least the last release in your git history.
Thanks a lot for all your input!
would it be possible that you cloned the pandas git repo with a limited "depth"?
No, I didn't clone the pandas repo with a limited depth.
I synced my copy with upstream/main
and I am now on commit 60b4400. The output of git describe
is v1.5.0.dev0-1285-g60b4400491
. I followed the instructions in "Creating a Python environment" again and this time I got the following error:
β pandas git:(main) conda env create -f environment.yml
Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies. Conda may not use the correct pip to install your packages, and they may end up in the wrong place. Please add an explicit pip dependency. I'm adding one for you, but still nagging you.
Collecting package metadata (repodata.json): done
Solving environment: failed
ResolvePackageNotFound:
- pandoc
I investigated the issue and the problem comes from conda-forge/pandoc-feedstock. The support for my processor (osx-arm64) has currently been disabled. It worked previously since I happened to install my local environment exactly during the period where the binary of the package was available on conda-forge.
I then moved pandoc
to the pip install section and also added pip to the list of environment packages. I did this by modifying the environment.yml
file as follows (from line 128):
# build the interactive terminal
- jupyterlab >=3.4,<4
- pip
- pip:
- jupyterlite==0.1.0b10
- sphinx-toggleprompt
- pandoc
After creating a new environment using the modified environment file, I then ran:
make clean
python setup.py build_ext -j 10
python -m pip install -e . --no-build-isolation --no-use-pep517
and the installation was successful. My version of pandas is now this one: 1.5.0.dev0+1285.g60b4400491.dirty
.
I then ran:
pytest pandas -n auto
and I got 44 test failures. Here are some examples of the failures I got:
Regarding the pandoc installation issue: I don't think the PyPI package for pandoc actually includes pandoc itself, AFAIK it's only the python bindings, and you still need to install the library yourself (there are no wheels for this package that include that). So it might not be as simple as moving it to the pip section (although of course for you as long as you are not using it, it's fine to get a dev environment set up). More in general, we should maybe see if we actually need pandoc (since it is not that easy to install). From a quick look, I think we only use it for converting the notebooks in our docs with nbsphinx (in the past we also used it to convert contributing.rst to md, but that has been removed it seems). It might be viable to actually do that without pandoc. For example myst-parser and myst-nb also allow to include markdown files and notebooks in sphinx documentation, without dependency on pandoc AFAIK. (but so this is for another issue!)
@marenwestermann for the test failures: I think you only posted part of a single test failure traceback (those tend to be very long with pytest ..). Can you update that comment with some more output?
I think the general goal should be that there are no test failures if you have a full development environment (except for some things like no network connection).
Hey, just reading through this again
Looks like the first two issues brought up got resolved when the installation went through correctly. pandoc is just used for a single notebook (doc/source/user_guide/style.ipynb
), so it's definitely not essential
Regarding running all tests, I'd suggest just adding a note there saying something like
If a handful of tests don't pass, it may not be an issue with your pandas installation. Some tests (e.g. some SQLAlchemy ones) require additional setup, others might start failing because a non-pinned library released a new version, and others might be flaky if run in parallel. As long as you can import pandas from your locally built version, your installation's probably fine and you can start contributing!
Sorry for not having followed up on this in the last few weeks. I'll have time to keep working on this from Friday this week. Thanks for your review @MarcoGorelli! I'll incorporate your feedback.
I finally had a look into this again. :) First of all, thank you very much @MarcoGorelli for updating the contributing guidelines. The documentation has much improved!
I followed the "Creating a development environment" instructions again. I'm now using my personal computer which has a Linux OS with the x86_64 architecture. (The Macbook mentioned above was my work computer which I had to return because I finished working at my last company).
I'm observing similar problems as described in my initial issue description at the top. I made a fresh clone of my fork of the pandas project and followed the instructions under Option 1a: using mamba (recommended). When I run python -m pip install -e . --no-build-isolation --no-use-pep517
I get the following errors:
This is the content of my conda enviroment:
When I do a git describe
the output is "fatal: No names found, cannot describe anything." This implies that I might have created a shallow clone as already suggested by @jorisvandenbossche , however, I did a regular git clone
without the --depth
option. I am at the latest commit (f2a91a0ed8c2f9198b39860c987a59cbdbcd9999
).
I haven't spent much time investigating the issue yet, but will do this now. I just thought I'll post an update as you might have ideas why this is happening, especially given that I'm now on a different machine with a different architecture.
thanks - no idea I'm afraid, I tried again in a new repo, also on Linux x86_64, and it worked fine. But if git describe
fails, to me that suggests the clone didn't work properly, I'd try deleting the repo and cloning it again, something like
cd ~/open-source
rm -rf pandas-maren
conda env remove -n pandas-dev
git clone git@github.com:pandas-dev/pandas.git pandas-maren
cd pandas-maren
git describe
If that looks fine, then
mamba env create
conda activate pandas-dev
python setup.py build_ext -j 12 # or 4, or however many cores you have / want to use
python -m pip install -e . --no-build-isolation --no-use-pep517
When I do a
git describe
the output is "fatal: No names found, cannot describe anything." This implies that I might have created a shallow clone as already suggested by @jorisvandenbossche , however, I did a regulargit clone
without the--depth
option. I am at the latest commit (f2a91a0ed8c2f9198b39860c987a59cbdbcd9999
).
I think this error indicates that you simply don't have any tags in your local clone. That also matches with the version you see "pandas-0+untagged.30317.gf2a91a0", so it seems you did a full clone (otherwise it wouldn't know there are 30317 commits since the start of pandas), but without downloading any tags.
That is still strange how you got that, because the default for git clone
is to also download the tags. I could reproduce your error by explicitly asking to not get the tags:
$ git clone https://github.com/pandas-dev/pandas.git pandas-test --no-tags
Cloning into 'pandas-test'...
...
$ cd pandas-test
pandas-test$ git describe
fatal: No names found, cannot describe anything.
And if doing a shallow clone (with --depth 1
instead of --no-tags
), I get the same error, but then you don't have those 30317 commits in the history.
If you don't have tags, you can still get them after cloning (https://stackoverflow.com/a/60883893/653364). But it is strange why you don't have them to begin with. One thing I can think of is that for some reason you have some configuration of git to now fetch tags by default.
Thank you @MarcoGorelli and @jorisvandenbossche for your replies! I finally managed to successfully install the development version of pandas, hooray! The pandas version that I now have is 2.0.0.dev0+398.g52acf970cd
. I followed the instructions given by @MarcoGorelli above. The only thing I did differently compared to before is that I cloned directly from the pandas repo (as instructed) instead of from my own fork. Last time I did sync my fork with the main repository before cloning. However, it seems like there was something wrong with my fork. I'm now proceeding to check the tests.
The only thing I did differently compared to before is that I cloned directly from the pandas repo (as instructed) instead of from my own fork.
Aha, that explains it! It seems that github nowadays doesn't include tags when forking: https://github.com/marenwestermann/pandas/tags is empty.
So if you then clone from your fork, you indeed don't have tags, and then it is expected you get this pandas-0+untagged....
version.
We should probably mention that somewhere in our contributing docs as a FAQ / gotcha that if you run into this kind of version string, this is probably the reason.
I think it should be enough to just add
git fetch upstream
to the end of this section
git clone https://github.com/your-user-name/pandas.git pandas-yourname
cd pandas-yourname
git remote add upstream https://github.com/pandas-dev/pandas.git
here https://pandas.pydata.org/docs/dev/development/contributing.html#forking
I saw that the documentation has been further improved which is great to see. I ran pytest pandas
again and all the test failures I got were related to SQL which was expected given that I didn't set up a database:
Two examples of full test failures can be seen here: https://gist.github.com/marenwestermann/b27f060f2c95697fdcf8cf83402712a8 https://gist.github.com/marenwestermann/a52886e068ce0b7894051d8bbd9ddf06
I then ran pytest pandas/tests/io/test_sql.py -m "not db"
in order to check if I get any test failures this way but all tests were successful. The command "not db" is mentioned in the documentation here but only in relation to speed. Should it be explicitly mentioned in the documentation that pytest pandas
will result in test failures if people didn't set up a database?
I also ran pytest pandas -n auto
to check if parallelisation causes problems but I currently don't get any additional test failures. (I did get an additional test failure related to memory usage recently but that seems to be fixed now).
Should I open an issue for mentioning the tag problem in the contributing docs as suggested by @jorisvandenbossche or is there already one?
Last but not least, sorry for the slow communication. I changed jobs recently and are slowly a bit more settled in which means I have a bit more time again now and can respond quicker. :)
I just followed the updated "Contributing to pandas" documentation which now includes git fetch upstream
in this section (see comment by @MarcoGorelli above) and the installation process was successful. So I guess there is no need for mentioning the tag problem that I ran into earlier in the documentation.
Yeah agreed - if you want to add the note suggested in https://github.com/pandas-dev/pandas/issues/48060#issuecomment-1260641776, I think that'd be helpful. Else I think we can close, I've opened https://github.com/pandas-dev/pandas/issues/49797 mentioning the pandoc issue
As I mentioned before, I think our goal should still be that the tests "pass" by default (if you followed the instructions to create a dev environment), meaning that the SQL tests should automatically skip if you don't have a database set up.
I don't know if that is easy (because we also don't want to silently skip those tests on CI if for example something goes wrong with the database set up, so we would need a way to force them being run on CI?)
Agreed, but in the meantime I think it's helpful to have a note so that new contributors aren't confused if they see some test failures
Certainly! Just want to ensure we keep track of that we can actually improve the situation, will open a new issue for it
Sorry, I deleted my comment from the link above, what I wanted to say though was that I also had the same issue as Maren where the tags weren't updated because I wasn't forking properly, and the conversation above saved me.
Pandas version checks
main
hereLocation of the documentation
https://pandas.pydata.org/docs/dev/development/contributing_environment.html#creating-a-python-environment
Documentation problem
My machine is a MacBookPro with chip Apple M1 Pro, I'm using conda, and I created an environment without Docker. I checked my version of
xcode-select
and it's2395
which seems to be the latest version. I followed the instructions in "Creating a Python environment". When I ranpython -m pip install -e . --no-build-isolation --no-use-pep517
I got the following output with the following errors:In the documentation for the development version name the example
0.22.0.dev0+29.g4ad6d4d74
is given which seems quite old. I think the documentation should be updated here 1. because the current naming system doesn't seem to follow the old convention of including the version number in the name of the installed pandas package and 2. given the error messages it's not clear if the installation process worked correctly.I ran
pytest pandas
to check if everything works correctly and the result was the following:4445 failed, 179821 passed, 1715 skipped, 5498 xfailed, 11 xpassed, 221 warnings, 168 errors
I'm not sure if the number of test failures is due to incorrect installation or if the failing tests just need to be fixed.This is the content of my pandas-dev conda environment:
Suggested fix for documentation
pytest pandas
and that these need to be fixed (unless this is due to an incorrect installation process on my machine).