pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.58k stars 17.9k forks source link

Flaky matplotlib tests #27143

Closed jbrockmendel closed 5 years ago

jbrockmendel commented 5 years ago

I'm frequently getting local test runs where nearly all tests/plotting tests will fail with something like


self = <pandas.tests.plotting.test_misc.TestSeriesPlots object at 0x12fc977b8>

    @pytest.mark.slow
    def test_autocorrelation_plot(self):
        from pandas.plotting import autocorrelation_plot
>       _check_plot_works(autocorrelation_plot, series=self.ts)

pandas/tests/plotting/test_misc.py:42: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/tests/plotting/common.py:509: in _check_plot_works
    ret = f(**kwargs)
pandas/plotting/_misc.py:374: in autocorrelation_plot
    plot_backend = _get_plot_backend()
pandas/plotting/_core.py:630: in _get_plot_backend
    return importlib.import_module(backend_str)
/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1006: in _gcd_import
    ???
<frozen importlib._bootstrap>:983: in _find_and_load
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

name = 'not_an_existing_module', import_ = <function _gcd_import at 0x10927ae18>

>   ???
E   ModuleNotFoundError: No module named 'not_an_existing_module'

<frozen importlib._bootstrap>:965: ModuleNotFoundError

Re-running with --lf consistently sees all of these pass.

OSX, PY37

jbrockmendel commented 5 years ago

Similar but not identical:

_____________________________________________________________________ TestRegistration.test_pandas_plots_register ______________________________________________________________________

self = <pandas.tests.plotting.test_converter.TestRegistration object at 0x132edc7f0>

    def test_pandas_plots_register(self):
        pytest.importorskip("matplotlib.pyplot")
        s = Series(range(12), index=date_range('2017', periods=12))
        # Set to the "warn" state, in case this isn't the first test run
        converter._WARN = True
        with tm.assert_produces_warning(None) as w:
>           s.plot()

pandas/tests/plotting/test_converter.py:90: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/plotting/_core.py:810: in __call__
    **kwds)
pandas/plotting/_core.py:768: in plot_series
    **kwds)
pandas/plotting/_core.py:716: in _plot
    plot_obj.generate()
pandas/plotting/_matplotlib/core.py:222: in generate
    self._post_plot_logic_common(ax, self.data)
pandas/plotting/_matplotlib/core.py:375: in _post_plot_logic_common
    fontsize=self.fontsize)
pandas/plotting/_matplotlib/core.py:454: in _apply_axis_properties
    labels = axis.get_majorticklabels() + axis.get_minorticklabels()
/usr/local/lib/python3.7/site-packages/matplotlib/axis.py:1253: in get_majorticklabels
    ticks = self.get_major_ticks()
/usr/local/lib/python3.7/site-packages/matplotlib/axis.py:1408: in get_major_ticks
    numticks = len(self.get_majorticklocs())
/usr/local/lib/python3.7/site-packages/matplotlib/axis.py:1325: in get_majorticklocs
    return self.major.locator()
/usr/local/lib/python3.7/site-packages/matplotlib/dates.py:1434: in __call__
    self.refresh()
/usr/local/lib/python3.7/site-packages/matplotlib/dates.py:1454: in refresh
    dmin, dmax = self.viewlim_to_dt()
/usr/local/lib/python3.7/site-packages/matplotlib/dates.py:1206: in viewlim_to_dt
    return num2date(vmin, self.tz), num2date(vmax, self.tz)
/usr/local/lib/python3.7/site-packages/matplotlib/dates.py:496: in num2date
    return _from_ordinalf(x, tz)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

x = 1.4841792e+18, tz = datetime.timezone.utc

    def _from_ordinalf(x, tz=None):
        """
        Convert Gregorian float of the date, preserving hours, minutes,
        seconds and microseconds.  Return value is a `.datetime`.

        The input date *x* is a float in ordinal days at UTC, and the output will
        be the specified `.datetime` object corresponding to that time in
        timezone *tz*, or if *tz* is ``None``, in the timezone specified in
        :rc:`timezone`.
        """
        if tz is None:
            tz = _get_rc_timezone()

        ix, remainder = divmod(x, 1)
        ix = int(ix)
        if ix < 1:
            raise ValueError('Cannot convert {} to a date.  This often happens if '
                             'non-datetime values are passed to an axis that '
                             'expects datetime objects.'.format(ix))
>       dt = datetime.datetime.fromordinal(ix).replace(tzinfo=UTC)
E       OverflowError: signed integer is greater than maximum

/usr/local/lib/python3.7/site-packages/matplotlib/dates.py:292: OverflowError

@datapythonista I think I've only noticed these since the big pd.plotting refactor. Is there any chance the two are related?

datapythonista commented 5 years ago

I guess the first one is happening because we access the same plotting.backend setting from different tests. Not sure why this happens, but we may need to mark tests as single.

No idea what could be causing the second error, doesn't seem related, but I don't know.

jbrockmendel commented 5 years ago

I did some troubleshooting of the latter and found some weird behavior in matplotlib. If the first set of errors is getting taken care of, that'd be great.

jbrockmendel commented 5 years ago

It looks like pd.get_option('plotting.backend') is returning "not_an_existing_module"

datapythonista commented 5 years ago

Not sure if for these problems it's better to add the single marker, or implement a lock as a decorator, and simply lock the access to those shared resources. Will try to spend some time on this soon.

TomAugspurger commented 5 years ago

FYI @jbrockmendel I'm not seeing any failures with pytest pandas/tests/plotting on pandas master and matplotlib 3.1.1.

jbrockmendel commented 5 years ago

git bisecting this the failures start after #26753

TomAugspurger commented 5 years ago

There was a week or so before I had a follow up delaying the import of matplotlib.pyplot. Not sure if that’s related though.

On Jul 24, 2019, at 21:59, jbrockmendel notifications@github.com wrote:

git bisecting this the failures start after #26753

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

jbrockmendel commented 5 years ago

Got a good hypothesis for the cause: i have pytest-randomly installed. @TomAugspurger can you try with that plugin and see if you can reproduce?

TomAugspurger commented 5 years ago

@jbrockmendel for better or worse, no failures with pytest-randomly installed (I assume there's no flag to enable it?)

jbrockmendel commented 5 years ago

This hasn't caused any trouble locally for a month; closing.