scikit-hep / mplhep

Extended histogram plotting on top of matplotlib and HEP collaboration compatible styling
https://mplhep.readthedocs.io
MIT License
185 stars 64 forks source link

mplhep styles with contourf log scale causes hang #362

Closed matthewfeickert closed 2 years ago

matthewfeickert commented 2 years ago

(Available as a GitHub Gist here)

I'm trying to do an interpolation of some points on a grid with mplhep v0.3.21 for an analysis at the moment. However, if I try to use matplotlib.pyplot.contourf with log spaces color levels using mplhep will cause a warning along the lines of

# Locator attempting to generate 18497 ticks ([0.15405694150420948, ..., 999.9912470033628]), which exceeds Locator.MAXTICKS (1000).

and it will then hang (I don't know if it resolves, as I kill it with a KeyboardInterrupt when it is clear it won't finish in a second or so).

If I comment out the use of plt.style.use(mplhep.style.ATLAS) (this happens with mplhep.style.ROOT too) then things proceed as normal and produce the plot in example.py.

example.py: ```python import matplotlib.colors as colors import mplhep import numpy as np from matplotlib import pyplot as plt from matplotlib import ticker from matplotlib.ticker import LogFormatterSciNotation from numpy import random from numpy.random import PCG64, SeedSequence from scipy.interpolate import LinearNDInterpolator # generate observed values (x, y) x_step = 250 x_range = np.arange(750, 3000 + x_step, step=x_step) y_step = 150 y_range = np.arange(700, 2000 + y_step, step=y_step) bit_generator = PCG64(SeedSequence(0)) rng = random.default_rng(bit_generator) x = [] y = [] for step in x_range: choose_n = rng.integers(low=0, high=y_range.size - 2) for value in y_range[: choose_n + 2]: x.append(step) y.append(value) x = np.asarray(x) y = np.asarray(y) # Generate uniform data on the interval [1e-3, 100.] # Uniform [a,b) = (b - a) * random_sample() + a uniform_range = [1e-3, 100.0] z = (uniform_range[1] - uniform_range[0]) * rng.random(x.size) + uniform_range[0] # Generate a 2D grid x_coords = np.linspace(min(x), max(x), 100) y_coords = np.linspace(min(y), max(y), 100) x_grid, y_grid = np.meshgrid(x_coords, y_coords) # 2D grid for interpolation # Interpolate with function of choice across the grid # between the known values interp_func = LinearNDInterpolator(list(zip(x, y)), z) interp_Z = interp_func(x_grid, y_grid) # plt.style.use(mplhep.style.ATLAS) # Uncommenting causes hang # Locator attempting to generate 18497 ticks ([0.15405694150420948, ..., 999.9912470033628]), which exceeds Locator.MAXTICKS (1000). fig, ax = plt.subplots() # plot interpolated values real_valued_Z = interp_Z[~np.isnan(interp_Z)] step_exp = 0.25 levels_exp = np.arange( np.floor(np.log10(real_valued_Z.min()) - 1), np.ceil(np.log10(real_valued_Z.max()) + 1) + step_exp, step=step_exp, ) levels = np.power(10, levels_exp) cs = ax.contourf( x_grid, y_grid, interp_Z, cmap="PuBu_r", levels=levels, norm=colors.LogNorm(vmin=levels.min(), vmax=levels.max()), ) formatter = LogFormatterSciNotation(10, labelOnlyBase=False) cbar = fig.colorbar(cs, format=formatter) # plot observed values scatter_size = 15.0 ax.scatter(x, y, s=scatter_size, color="white", edgecolors="black") ax.set_xlabel(r"$x$") ax.set_ylabel(r"$y$") cbar.set_label(r"$z(x, y)$") file_types = ["png", "pdf"] for extension in file_types: fig.savefig(f"example.{extension}") ```

example

What is interesting, is if I attempt to get around this by using pcolormesh and then contour to create a similar-ish type of plot there is no issue and things work as expected as can be seen in example_pmesh.py.

example_pmesh.py: ```python import matplotlib.colors as colors import mplhep import numpy as np from matplotlib import pyplot as plt from matplotlib import ticker from matplotlib.ticker import LogFormatterSciNotation from numpy import random from numpy.random import PCG64, SeedSequence from scipy.interpolate import LinearNDInterpolator # generate observed values (x, y) x_step = 250 x_range = np.arange(750, 3000 + x_step, step=x_step) y_step = 150 y_range = np.arange(700, 2000 + y_step, step=y_step) bit_generator = PCG64(SeedSequence(0)) rng = random.default_rng(bit_generator) x = [] y = [] for step in x_range: choose_n = rng.integers(low=0, high=y_range.size - 2) for value in y_range[: choose_n + 2]: x.append(step) y.append(value) x = np.asarray(x) y = np.asarray(y) # Generate uniform data on the interval [1e-3, 100.] # Uniform [a,b) = (b - a) * random_sample() + a uniform_range = [1e-3, 100.0] z = (uniform_range[1] - uniform_range[0]) * rng.random(x.size) + uniform_range[0] # Generate a 2D grid x_coords = np.linspace(min(x), max(x), 100) y_coords = np.linspace(min(y), max(y), 100) x_grid, y_grid = np.meshgrid(x_coords, y_coords) # 2D grid for interpolation # Interpolate with function of choice across the grid # between the known values interp_func = LinearNDInterpolator(list(zip(x, y)), z) interp_Z = interp_func(x_grid, y_grid) plt.style.use(mplhep.style.ATLAS) # Here there is no problem fig, ax = plt.subplots() # plot interpolated values real_valued_Z = interp_Z[~np.isnan(interp_Z)] step_exp = 0.25 levels_exp = np.arange( np.floor(np.log10(real_valued_Z.min()) - 1), np.ceil(np.log10(real_valued_Z.max()) + 1) + step_exp, step=step_exp, ) levels = np.power(10, levels_exp) pcm = ax.pcolormesh( x_grid, y_grid, interp_Z, cmap="PuBu_r", norm=colors.LogNorm(vmin=levels.min(), vmax=levels.max()), shading="auto", ) formatter = LogFormatterSciNotation(10, labelOnlyBase=False) cbar = fig.colorbar(pcm, format=formatter) cs = ax.contour( x_grid, y_grid, interp_Z, cmap="PuBu_r", ) # plot observed values scatter_size = 15.0 ax.scatter(x, y, s=scatter_size, color="white", edgecolors="black") ax.set_xlabel(r"$x$") ax.set_ylabel(r"$y$") cbar.set_label(r"$z(x, y)$") file_types = ["png", "pdf"] for extension in file_types: fig.savefig(f"example_pmesh.{extension}") ```

example_pmesh

@andrzejnovak Do you have any idea why this could be happening, and if there is a way to avoid it or fix it?

cc @kratsg

andrzejnovak commented 2 years ago

Oh sneaky, since the MRE doesn't seem to use any mplhep functions I think this is a matplotlib error actually. Probably one of the rcParams in

https://github.com/scikit-hep/mplhep/blob/dd70545186c9f26a485f30f7d0d9133475a878d4/src/mplhep/styles/atlas.py#L45-L77

Could you test them with your code and https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.rc_context.html#matplotlib.pyplot.rc_context ?

matthewfeickert commented 2 years ago

@andrzejnovak Ah good point! Indeed, the problem is

https://github.com/scikit-hep/mplhep/blob/dd70545186c9f26a485f30f7d0d9133475a878d4/src/mplhep/styles/atlas.py#L74

If I use the matplotlib.rc_context to override mplhep.style.ATLAS for just this then things work as expected

from matplotlib import rc_context

...

plt.style.use(mplhep.style.ATLAS)

rc_options = {"ytick.minor.visible": False}  # yticks
with rc_context(rc_options):
    fig, ax = plt.subplots()
    ...
Updated example.py: ```python import matplotlib.colors as colors import mplhep import numpy as np from matplotlib import pyplot as plt from matplotlib import rc_context from matplotlib.ticker import LogFormatterSciNotation from numpy import random from numpy.random import PCG64, SeedSequence from scipy.interpolate import LinearNDInterpolator # generate observed values (x, y) x_step = 250 x_range = np.arange(750, 3000 + x_step, step=x_step) y_step = 150 y_range = np.arange(700, 2000 + y_step, step=y_step) bit_generator = PCG64(SeedSequence(0)) rng = random.default_rng(bit_generator) x = [] y = [] for step in x_range: choose_n = rng.integers(low=0, high=y_range.size - 2) for value in y_range[: choose_n + 2]: x.append(step) y.append(value) x = np.asarray(x) y = np.asarray(y) # Generate uniform data on the interval [1e-3, 100.] # Uniform [a,b) = (b - a) * random_sample() + a uniform_range = [1e-3, 100.0] z = (uniform_range[1] - uniform_range[0]) * rng.random(x.size) + uniform_range[0] # Generate a 2D grid x_coords = np.linspace(min(x), max(x), 100) y_coords = np.linspace(min(y), max(y), 100) x_grid, y_grid = np.meshgrid(x_coords, y_coords) # 2D grid for interpolation # Interpolate with function of choice across the grid # between the known values interp_func = LinearNDInterpolator(list(zip(x, y)), z) interp_Z = interp_func(x_grid, y_grid) plt.style.use(mplhep.style.ATLAS) # c.f. https://github.com/scikit-hep/mplhep/issues/362 rc_options = {"ytick.minor.visible": False} # yticks with rc_context(rc_options): fig, ax = plt.subplots() # plot interpolated values real_valued_Z = interp_Z[~np.isnan(interp_Z)] step_exp = 0.25 levels_exp = np.arange( np.floor(np.log10(real_valued_Z.min()) - 1), np.ceil(np.log10(real_valued_Z.max()) + 1) + step_exp, step=step_exp, ) levels = np.power(10, levels_exp) cs = ax.contourf( x_grid, y_grid, interp_Z, cmap="PuBu_r", levels=levels, norm=colors.LogNorm(vmin=levels.min(), vmax=levels.max()), ) formatter = LogFormatterSciNotation(10, labelOnlyBase=False) cbar = fig.colorbar(cs, format=formatter) # plot observed values scatter_size = 15.0 ax.scatter(x, y, s=scatter_size, color="white", edgecolors="black") ax.set_xlabel(r"$x$") ax.set_ylabel(r"$y$") cbar.set_label(r"$z(x, y)$") file_types = ["png", "pdf"] for extension in file_types: fig.savefig(f"example.{extension}") ```

Similarly, I can get things to fail without using mplhep at all with just

from matplotlib import rc_context

...

rc_options = {"ytick.minor.visible": False}  # yticks
with rc_context(rc_options):
    fig, ax = plt.subplots()
    ...

I've updated the Gist too.

matthewfeickert commented 2 years ago

"ytick.minor.visible": True is used by all of the LHC experiment styles

https://github.com/scikit-hep/mplhep/blob/dd70545186c9f26a485f30f7d0d9133475a878d4/src/mplhep/styles/alice.py#L33

https://github.com/scikit-hep/mplhep/blob/65df557c39e0b24ff4ecf8453e680d301ace5921/src/mplhep/styles/cms.py#L49

https://github.com/scikit-hep/mplhep/blob/c1f98951b153af9e2d77a94504b01bfb3f53dcb8/src/mplhep/styles/atlas.py#L74

https://github.com/scikit-hep/mplhep/blob/4ccc8159e3f0c4a01cb1a9b867a273cf9b511b30/src/mplhep/styles/lhcb.py#L192

and as mplhep.style.ROOT is mplhep.style.CMS

https://github.com/scikit-hep/mplhep/blob/a1003334a58cb785d77efd07250925db6ffba564/src/mplhep/styles/cms.py#L69

that's why we see it there too.

This is at least functionally resolved (:+1: ), but I'll follow up with matplotlib people to see if I am writing bad code here for what I want to do.

andrzejnovak commented 2 years ago

Thanks @matthewfeickert

matthewfeickert commented 2 years ago

Fun note! While making these examples using generated random data so I could make these examples public I learned that

from numpy import random
random.seed(0)

is considered bad and we should instead be doing things like

from numpy import random
from numpy.random import PCG64, SeedSequence
rng = random.default_rng(PCG64(SeedSequence(0)))  # Generator(PCG64) at 0x7F00A8E519E0
rng.random()  # 0.6369616873214543

Seems this is related to NEP 19 and there's a blog post more on this (Stop using numpy.random.seed()).

andrzejnovak commented 2 years ago

@matthewfeickert can we close this?

matthewfeickert commented 2 years ago

can we close this?

Yeah, I'll reopen if needed. Once I get a better summary of what is happening in matplotlib I can add a note here as well.

matthewfeickert commented 2 years ago

So @tacaswell was indeed right (from the IRIS-HEP Slack chats)

Can you check with the main branch? We may have fixed that recently. We made big improvements to how ticks in colorbars works (now basically the same as everything else) but that exposed just how many "special" things were in there

as going following the instructions in https://github.com/matplotlib/matplotlib/issues/9994#issuecomment-808985409 on how to find the nightly matplotlib wheels that GHA builds, which I'll recap here:

  1. Go to the Build CI wheels GHA workflow on matplotlib's GitHub and select the latest PR build.
  2. Select the "wheels" artifact at the bottom of the build page and download it (might as well do this manually rather than use curl as there is no stable URL that you can consistently hit with this method):

matplotlib-artifact

  1. Unzip the zipfile

    $ mkdir matplotlib_wheels
    $ unzip wheels.zip -d matplotlib_wheels/
  2. Install the wheel that matches your CPython install. e.g, for me:

$ python -m pip install --upgrade matplotlib_wheels/*cp39-cp39-manylinux*x86_64.whl
$ python -m pip show matplotlib
Name: matplotlib
Version: 3.6.0.dev1922+gccf5115389
Summary: Python plotting package
Home-page: https://matplotlib.org
Author: John D. Hunter, Michael Droettboom
Author-email: matplotlib-users@python.org
License: PSF
Location: /home/feickert/.pyenv/versions/3.9.6/envs/mplhep-bug/lib/python3.9/site-packages
Requires: cycler, fonttools, kiwisolver, numpy, packaging, pillow, pyparsing, python-dateutil
Required-by: mplhep

With this preview of the next matplotlib release my example file with the mplhep defaults (the minimal edits to get that back are to use

$ git diff
diff --git a/example.py b/example.py
index 1d29ef4..d31f19e 100644
--- a/example.py
+++ b/example.py
@@ -46,7 +46,7 @@ interp_Z = interp_func(x_grid, y_grid)
 plt.style.use(mplhep.style.ATLAS)

 # c.f. https://github.com/scikit-hep/mplhep/issues/362
-rc_options = {"ytick.minor.visible": False}  # yticks
+rc_options = {"ytick.minor.visible": True}  # yticks
 with rc_context(rc_options):
     fig, ax = plt.subplots()

)

then things work fine. :+1:

example


edit: matplotlib now has nightly wheels, so to get them just do

$ python -m pip install \
  --upgrade \
  --pre \
  --index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple \
  --extra-index-url https://pypi.org/simple \
  matplotlib