lincc-frameworks / tape

[Deprecated] Package for working with LSST time series data
https://tape.readthedocs.io
MIT License
12 stars 3 forks source link

Fix Flaky Unit Test Smoke Test #427

Closed wilsonbb closed 4 months ago

wilsonbb commented 4 months ago

Bug report

Since the merging of https://github.com/lincc-frameworks/tape/pull/420, there has been some flakiness in unit tests that seems to be resulting in consistent smoke test failures. Example failure run here with the core error being

=================================== FAILURES ===================================
____________ test_batch_single_lc[parquet_ensemble_with_divisions] _____________

data_fixture = 'parquet_ensemble_with_divisions'
request = <FixtureRequest for <Function test_batch_single_lc[parquet_ensemble_with_divisions]>>

    @pytest.mark.parametrize("data_fixture", ["parquet_ensemble", "parquet_ensemble_with_divisions"])
    def test_batch_single_lc(data_fixture, request):
        """
        Test that ensemble.batch() can run a function on a single light curve.
        """
        parquet_ensemble = request.getfixturevalue(data_fixture)

        # Perform batch only on this specific lightcurve.
        lc = 8847293[52](https://github.com/lincc-frameworks/tape/actions/runs/8611541349/job/23598922571#step:6:53)74829959

        # Check that we raise an error if single_lc is neither a bool nor an integer
        with pytest.raises(ValueError):
            parquet_ensemble.batch(calc_stetson_J, use_map=True, on=None, band_to_calc=None, single_lc="foo")

        lc_res = parquet_ensemble.prune(10).batch(
            calc_stetson_J, use_map=True, on=None, band_to_calc=None, single_lc=lc
        )
        assert len(lc_res) == 1

        # Now ensure that we got the same result when we ran the function on the entire ensemble.
        full_res = parquet_ensemble.prune(10).batch(calc_stetson_J, use_map=True, on=None, band_to_calc=None)
        assert full_res.compute().loc[lc].stetsonJ == lc_res.compute().iloc[0].stetsonJ

# Check that when single_lc is True we will perform batch on a random lightcurve and still get only one result.
        rand_lc = parquet_ensemble.prune(10).batch(
            calc_stetson_J, use_map=True, on=None, band_to_calc=None, single_lc=True
        )
>       assert len(rand_lc) == 1

tests/tape_tests/test_ensemble.py:2157: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/dask_expr/_collection.py:381: in __len__
    return new_collection(Len(self)).compute()
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/dask_expr/_collection.py:453: in compute
    return DaskMethodsMixin.compute(out, **kwargs)
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/dask/base.py:375: in compute
    (result,) = compute(self, traverse=False, **kwargs)
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/dask/base.py:661: in compute
    results = schedule(dsk, keys, **kwargs)
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/dask_expr/_expr.py:3570: in _execute_task
    return dask.core.get(graph, name)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

df = ('blockwisemerge-fused-blockwisemerge-6316c8b08c705ff476c17086ed1c0b51', 3)
iindexer = slice(884[80](https://github.com/lincc-frameworks/tape/actions/runs/8611541349/job/23598922571#step:6:81)001158382516, 88480001158382516, None), cindexer = None

    def loc(df, iindexer, cindexer=None):
        """
        .loc for known divisions
        """
        if cindexer is None:
>           return df.loc[iindexer]
E           AttributeError: 'tuple' object has no attribute 'loc'

Note that it makes sense that might be flaky since this is the part of the test where we're choosing a random lightcurve. When computing the batch result we're then seeing some sort of indexing failure, possible because the chosen lightcurve is invalid in some way? Need to test locally to identify what choices of random lightcurve are producing this error.

Before submitting Please check the following:

wilsonbb commented 4 months ago

There have been some additional test failures that seem unrelated to the single_lc change and that I haven't reproduced locally

tests/tape_tests/test_ensemble.py:1944: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask_expr/_collection.py:417: in __repr__
    data = self._repr_data().to_string(max_rows=5)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask_expr/_collection.py:3993: in _repr_data
    index = self._repr_divisions
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask_expr/_collection.py:2572: in _repr_divisions
    name = f"npartitions={self.npartitions}"
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask_expr/_collection.py:344: in npartitions
    return self.expr.npartitions
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask_expr/_expr.py:397: in npartitions
    return len(self.divisions) - 1
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/functools.py:993: in __get__
    val = self.func(instance)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask_expr/_expr.py:381: in divisions
    return tuple(self._divisions())
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask_expr/_expr.py:647: in _divisions
    return _get_divisions_map_partitions(
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask/dataframe/core.py:7330: in _get_divisions_map_partitions
    divisions = max((d.divisions for d in dfs), key=len)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask/dataframe/core.py:7330: in <genexpr>
    divisions = max((d.divisions for d in dfs), key=len)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/functools.py:993: in __get__
    val = self.func(instance)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask_expr/_expr.py:381: in divisions
    return tuple(self._divisions())
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask_expr/_expr.py:528: in _divisions
    if not self._broadcast_dep(arg):
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask_expr/_expr.py:519: in _broadcast_dep
    return dep.npartitions == 1 and dep.ndim < self.ndim
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask_expr/_expr.py:397: in npartitions
    return len(self.divisions) - 1
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/functools.py:993: in __get__
    val = self.func(instance)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask_expr/_expr.py:381: in divisions
    return tuple(self._divisions())
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask_expr/_groupby.py:901: in _divisions
    if self.need_to_shuffle:
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/functools.py:993: in __get__
    val = self.func(instance)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dask_expr/_groupby.py:920: in need_to_shuffle
    if any(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

.0 = <set_iterator object at 0x7f683e2ee[200](https://github.com/lincc-frameworks/tape/actions/runs/8761379366/job/24048016145#step:5:201)>

    if any(
>       set(self._by_columns) >= set(cols)
        for cols in self.frame.unique_partition_mapping_columns_from_shuffle
    ):
E   TypeError: 'NoneType' object is not iterable

This is possibly due to a change in divisions behavior in a new version of dask-expr

The original errors above seem to be when we run on specific lightcurves, though these lightcurves seem fairly "normal", and I haven't figured out the origin of the error there.

Screenshot 2024-04-22 at 9 57 22 AM Screenshot 2024-04-22 at 9 57 12 AM
wilsonbb commented 4 months ago

The fix https://github.com/lincc-frameworks/tape/pull/440 should hopefully make the smoke test green, and if that's the case tomorrow, we'll close this and use to https://github.com/lincc-frameworks/tape/issues/434 to continue tracking the underlying issues.

wilsonbb commented 4 months ago

The smoke test passed so closing https://github.com/lincc-frameworks/tape/actions/runs/8890536731