googlecolab / colabtools

Python libraries for Google Colaboratory
Apache License 2.0
2.21k stars 727 forks source link

Upgrade Pandas to 2.2.2 #4870

Closed metrizable closed 1 month ago

metrizable commented 2 months ago

Upgrade to Pandas 2.2.2

The Pandas version pre-installed in the Colab runtime has been upgraded to 2.2.2. The 2.2.x series was released in Jan 2024 and version 2.2.2 brings in several features, bug fixes, optimizations, and deprecations (docs).

Of note, Pandas 2.2.2 is compatible with a forthcoming upgrade to numpy 2.0. We're also now pre-installing the optional bottleneck package, a collection of fast numpy array functions, which Pandas can use, if present, to support accelerating certain types of operations.

The upgrade to Pandas 2.2.2 keeps the version pre-installed in Colab up-to-date with the current scientific computing ecosystem with plans for future upgrades.

jlchang commented 1 month ago

FYI, Pandas 2.2.2 seems to have a plotting bug (this does not seem to be specific to Colab). For this tutorial, running albany['circulation'].plot() renders:

Screenshot 2024-10-04 at 5 12 09 AM

instead of:

Screenshot 2024-10-04 at 5 14 11 AM

Pandas 2.0.3 generates the expected plot (Pandas 2.2.3 is also problematic). I filed an issue in the Pandas repo, the bug is confirmed and a workaround was provided. Letting you know the issue because the absence of index sorting may have consequences for more than just the "plot" example being used in the tutorial I've cited.

tueda commented 1 month ago

The current version of seaborn pre-installed in the runtime is 0.13.1, which has compatibility issues with pandas 2.2. See the seaborn 0.13.2 release notes.

Edit: To clarify, the pre-installed seaborn version in the Colab runtime should be upgraded to 0.13.2.

Further Edit: As of 11 Oct 2024, seaborn version in the runtime is 0.13.2.

jlchang commented 1 month ago

Thanks for the pointer to seaborn - always a great option for plotting!

According to the pandas team, the new behavior is a feature and not a bug. The logic behind the change is reasonable, the change in plotting behavior was just unexpected and not well documented. Going forward, one should .sort_index().plot() to get the same plot as .plot() used to generate.