pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.77k stars 17.97k forks source link

RLS: pandas 1.5.1 #48991

Closed datapythonista closed 2 years ago

datapythonista commented 2 years ago

It's been more than two weeks since the the release of pandas 1.5.0. There are several fixes to it since then, probably worth releasing a 1.5.1.

I see we've got some regressions for 1.5.1: https://github.com/pandas-dev/pandas/issues?q=is%3Aopen+is%3Aissue+label%3ARegression+milestone%3A1.5.1

Is there any that should be fixed before a 1.5.1?

phofl commented 2 years ago

We should pin/fix the pydata sphinx theme issue and #48987 is ready

MarcoGorelli commented 2 years ago

shall we just revert https://github.com/pandas-dev/pandas/pull/46174 to solve https://github.com/pandas-dev/pandas/issues/48826 ?

mroeschke commented 2 years ago

We can discuss more during the dev call next week, but IMO the existing regression don't appear to be absolute blockers for a 1.5.1 release

datapythonista commented 2 years ago

Waiting for #49032 and #49137 and will release pandas 1.5.1. I moved all other PRs in the milestone to 1.5.2.

MarcoGorelli commented 2 years ago

Looks like #49032 isn't quite ready - shall we just revert the original PR, and then, get the (fixed) performance improvement in for 2.0.0?

Have opened https://github.com/pandas-dev/pandas/pull/49140 if we just want to revert

phofl commented 2 years ago

+1 on reverting for now

GYHHAHA commented 2 years ago

Sounds good. Just revert that now and consider moving PERF to 2.0.0.

mroeschke commented 2 years ago

Thanks @MarcoGorelli for the revert in https://github.com/pandas-dev/pandas/issues/48826; just merged it so we should be good to go once the backport PR is merged.

datapythonista commented 2 years ago

I'll start now the tests for the release and the release afterwards if everything looks good. I'll keep you updated here.

datapythonista commented 2 years ago

Our step to build the sdist is failing because of a problem in conda. Looks like the blas package downloaded from Anaconda is corrupt.

This is the package in repodata.json:

    "blas-1.0-openblas.tar.bz2": {
      "build": "openblas",
      "build_number": 7,
      "depends": [],
      "md5": "671f6e045d03a842f379145e7fc43c0d",
      "name": "blas",
      "sha256": "ed246aa0bf2809030f911cefd5b16a2c4de0d1b21f75f76e5178122d2213727e",
      "size": 49374,                          
      "subdir": "linux-64",           
      "timestamp": 1528224118045,             
      "track_features": "nomkl",
      "version": "1.0"                        
    },

But the downloaded file is significantly smaller, and sha256 checksum doesn't match: https://repo.anaconda.com/pkgs/main/linux-64/blas-1.0-openblas.tar.bz2

Seems like a global problem, and not in the instance of the CDN accessed by my network, since I tried with a vpn and the problem persists.

I'm able to create the sdist without problems without the docker image used in our release instructions. An environment with just numpy and cython from conda-forge, using the same blas=1.0=openblas package but from conda-forge seems to be working fine, and I guess the resulting sdist should be the same.

Does anyone know if there is a reason to use the pandas-release docker with all its dependencies from the Anaconda channel, instead of using my local environment with those 3 dependencies?

phofl commented 2 years ago

It also looks quite old, the newest version on conda-forge is 2.116

datapythonista commented 2 years ago

It also looks quite old, the newest version on conda-forge is 2.116

I don't think for building the sdist it even makes much of a difference, feels like only setup.py is executed for it, and numpy is only used to call numpy.get_include() which contains the C headers of numpy. So, as far as numpy can be imported and is updated, I don't think any of its dependencies really matter.

But surely no reason that I know to use that old blas and use the Anaconda channel. If there are no more comments I'll move forward building the dist with a fresh local environment, and not from the pandas-build docker image in the pandas-release repo.

datapythonista commented 2 years ago

Pushing tag and starting wheel builds.

datapythonista commented 2 years ago

GitHub release and PyPI packages are now ready. There seems to be a problem with Sphinx, where workers don't seem to finish after the docs are built. Will give it one more try, and will build the docs for the release manually if the job continues to fail.

conda-forge should now be able to detect the new package and automatically open the PR to get the new release there. Will wait for it to happen and merge it.

datapythonista commented 2 years ago

Docs are updated. We seem to have problems when using Sphinx with multiple workers, happens in the CI and also happened to me locally. Things seem fine when using a single job.

Still waiting for the automatic conda-forge PR, I'll wait some more, and open it manually if it doesn't seem it'll be generated automatically..

datapythonista commented 2 years ago

PR for conda-forge created: https://github.com/conda-forge/pandas-feedstock/pull/143

I'll wait for the CI to complete, and merge it.

lithomas1 commented 2 years ago

@datapythonista Can you also tag v2.0.0dev0 while you're at it too? Thanks.

datapythonista commented 2 years ago

@datapythonista Can you also tag v2.0.0dev0 while you're at it too? Thanks.

Sure, I'll do it, but I prefer to finish the release even if it shouldn't affect.

Seems like we've got an error in the arm builds in conda-forge. Seems related to this, having a look to see if I can find the problem.

datapythonista commented 2 years ago

Seems like the errors are a known issue with pypy. I'll wait for the last jobs to finish (the ppc64le jobs), and if they are fine, I'll merge the PR.

datapythonista commented 2 years ago

Tag v2.0.0.dev0 created at the same location than v1.6.0.dev0. Versioneer seems to be using the new tag.

CC: @lithomas1 @phofl @mroeschke

phofl commented 2 years ago

Thanks

MarcoGorelli commented 2 years ago

thanks @datapythonista !

all good to start merging deprecation PRs then?

phofl commented 2 years ago

Yep

lithomas1 commented 2 years ago

Tag v2.0.0.dev0 created at the same location than v1.6.0.dev0. Versioneer seems to be using the new tag.

CC: @lithomas1 @phofl @mroeschke

Thanks.

MarcoGorelli commented 2 years ago

Looks like there's a release on PyPI, the docs are updated, it's available on conda-forge

OK to close the issue and announce?

datapythonista commented 2 years ago

Yep, sorry, it was my nighttime before I could make the final verifications and the announcements.

All done now, I checked the release packages for my architecture, and made the announcements on the pydata and pandas-dev mailing lists, Telegram, Twitter and Slack.

Closing this. I created #49194 for 1.5.2, targeting at November for the release.

phofl commented 2 years ago

Can we close the 1.5.1 milestone?

datapythonista commented 2 years ago

Can we close the 1.5.1 milestone?

Yep, forgot about it, thanks for the heads up. I moved all issues in the milestone 1.5.1 to 1.5.2 and closed 1.5.1.