[BUG] logic problem with new CI - coverage of `TestAllEstimators` and `TestAllObjects` incorrect

fkiraly commented 4 months ago

I think there is a bug in the new CI. The coverage is not affected since "old CI" is still running.

The issue occurs when, say, a forecaster changes.

The new CI will then trigger tests in sktime.forecasting, but not in sktime.tests which contains TestAllEstimator. So, we run the forecaster specific tests for the changed forecaster, but not the object and estimator specific tests.

FYI @yarnabrina, is there a way to fix this? The "one liner" to "run all tests for estimator X" is check_estimator - however, this is intentionally avoided in the original setup as an entry point since this distributes less well across workers, via pytest-xdist.

fkiraly commented 4 months ago

The "opposite case" is where there are changes in "other" modules, as well as in a forecaster. The TestAllEstimators test for that forecaster then runs in the "other" test job, not in the "module/forecasters" bucket: https://github.com/sktime/sktime/actions/runs/8861405811/job/24333294008?pr=6348

yarnabrina commented 4 months ago

I just commented in Discord as well, this job is not coming from "other" case in new CI. It's coming from old CI relying on the Makefile, specifically this line is being called to run the tests:

https://github.com/sktime/sktime/blob/3156135c0ecbe689782aa88fbb866c2019fbc934/.github/workflows/test.yml#L298

New CI jobs will have been called by this line instead:

https://github.com/sktime/sktime/blob/3156135c0ecbe689782aa88fbb866c2019fbc934/.github/workflows/test_other.yml#L66-L82

fkiraly commented 4 months ago

Sorry, wrong link: https://github.com/sktime/sktime/actions/runs/8861405811/job/24334091449?pr=6348

This is coming from "other" in new CI, screenshot:

fkiraly commented 4 months ago

The solution could be, to run tests in all "module" tests, that will at least ensure that the estimator/object tests are run for changed forecasters.

We cannot turn them off in "other", as estimators may live in "other" modules. But that will trigger only if an "other" and a "module" estimator is changed in the same PR, so perhaps low incidence.

yarnabrina commented 4 months ago

I've created PR #6353, please take a look. But based on sample runs on my laptop, this is going to make new CI significantly slower, even the test collection part. For example, the times before and after the change for forecasting with just test collection (--co flag) is 21.85 seconds (43669 tests) vs 314.01 seconds (79603 tests).

fkiraly commented 4 months ago

Only test collection?

Before we merge then, can we diagnose where this is coming from?

Collection should certainly not take so long.

fkiraly commented 4 months ago

I just checked construction times with the code below. It looks like construction is not the culprit, most likely. Most estimators construct instantaneously, there are some which need 10s of milliseconds which is probably too long (but does not explain collection). It could of course be that I am missing something in a soft dependency I do not have installed.

from sktime.utils.validation._dependencies import _check_estimator_deps
from sktime.registry import all_estimators

ests = all_estimators(return_names=False)

def _measure_init_time(cls, params=None):
    from time import time

    start = time()
    try:
        cls(**params)
    except Exception:
        pass
    end = time()
    return end - start

times = []

for est in ests:
    if _check_estimator_deps(est, severity="none"):
        params = est.get_test_params()
        if not isinstance(params, list):
            params = [params]
        for param in params:
            times.append((est, _measure_init_time(est, param)))

yarnabrina commented 4 months ago

Just in case it's device or OS specific, what are the times for you for the above reported cases? I used these commands:

# pre-PR
python -m pytest sktime/forecasting --co

# post-PR
python -m pytest sktime/forecasting sktime/tests --co

yarnabrina commented 4 months ago

I just checked construction times with the code below. It looks like construction is not the culprit, most likely. Most estimators construct instantaneously, there are some which need 10s of milliseconds which is probably too long (but does not explain collection). It could of course be that I am missing something in a soft dependency I do not have installed.

I think construction or soft dependencies do not affect the above two commands I shared. I am not 100% confident though.

fkiraly commented 4 months ago

I ran the first commands on main, python 3.11, windows - I cancelled it after it was running for 10 minutes.

That is very odd. Last time we measured test collection time, it was 15sec (minimal soft deps), or 1min (many extras)

update:

1st command: 661 secs (11 min)
2nd runs >20min, terminated

my show_versions:

Python dependencies:
          pip: 23.3.1
       sktime: 0.28.1
      sklearn: 1.3.2
       skbase: 0.7.6
        numpy: 1.26.2
        scipy: 1.11.4
       pandas: 2.1.4
   matplotlib: 3.8.3
       joblib: 1.3.2
        numba: 0.59.1
  statsmodels: 0.14.0
     pmdarima: 2.0.4
statsforecast: None
      tsfresh: 0.20.2
      tslearn: None
        torch: None
   tensorflow: None
tensorflow_probability: None

fkiraly commented 4 months ago

what are your timings?

I also note how strange tihs is, because isn't 600s the stanard timeout for pytest collect?

fkiraly commented 4 months ago

I think we are getting closer - I updated the issue https://github.com/sktime/sktime/issues/6344 with the problem description. I will now run a profiler on the test collection which takes too long.

yarnabrina commented 4 months ago

For example, the times before and after the change for forecasting with just test collection (--co flag) is 21.85 seconds (43669 tests) vs 314.01 seconds (79603 tests).

These are my times. But I wonder why's mine so much faster than yours!!

Idea: I don't have pytest-xdist or other plugins installed so disabled those in configuration (setup.cfg). Can you try that once?

yarnabrina commented 4 months ago

I also note how strange tihs is, because isn't 600s the stanard timeout for pytest collect?

Yes, it's configured in setup.cfg. My guess is it is for test execution, not for collection. Don't know how to check that.

fkiraly commented 4 months ago

That is very odd. 20 sec are more in line with the times from this older issue: https://github.com/sktime/sktime/issues/4900

fkiraly commented 4 months ago

should we move the test times discussion to here? https://github.com/sktime/sktime/issues/6344

sktime / sktime

[BUG] logic problem with new CI - coverage of `TestAllEstimators` and `TestAllObjects` incorrect #6352