Closed fkiraly closed 4 months ago
The "opposite case" is where there are changes in "other" modules, as well as in a forecaster. The TestAllEstimators
test for that forecaster then runs in the "other" test job, not in the "module/forecasters" bucket:
https://github.com/sktime/sktime/actions/runs/8861405811/job/24333294008?pr=6348
I just commented in Discord as well, this job is not coming from "other" case in new CI. It's coming from old CI relying on the Makefile, specifically this line is being called to run the tests:
New CI jobs will have been called by this line instead:
Sorry, wrong link: https://github.com/sktime/sktime/actions/runs/8861405811/job/24334091449?pr=6348
This is coming from "other" in new CI, screenshot:
The solution could be, to run tests
in all "module" tests, that will at least ensure that the estimator/object tests are run for changed forecasters.
We cannot turn them off in "other", as estimators may live in "other" modules. But that will trigger only if an "other" and a "module" estimator is changed in the same PR, so perhaps low incidence.
I've created PR #6353, please take a look. But based on sample runs on my laptop, this is going to make new CI significantly slower, even the test collection part. For example, the times before and after the change for forecasting with just test collection (--co
flag) is 21.85 seconds (43669 tests) vs 314.01 seconds (79603 tests).
Only test collection?
Before we merge then, can we diagnose where this is coming from?
Collection should certainly not take so long.
I just checked construction times with the code below. It looks like construction is not the culprit, most likely. Most estimators construct instantaneously, there are some which need 10s of milliseconds which is probably too long (but does not explain collection). It could of course be that I am missing something in a soft dependency I do not have installed.
from sktime.utils.validation._dependencies import _check_estimator_deps
from sktime.registry import all_estimators
ests = all_estimators(return_names=False)
def _measure_init_time(cls, params=None):
from time import time
start = time()
try:
cls(**params)
except Exception:
pass
end = time()
return end - start
times = []
for est in ests:
if _check_estimator_deps(est, severity="none"):
params = est.get_test_params()
if not isinstance(params, list):
params = [params]
for param in params:
times.append((est, _measure_init_time(est, param)))
Just in case it's device or OS specific, what are the times for you for the above reported cases? I used these commands:
# pre-PR
python -m pytest sktime/forecasting --co
# post-PR
python -m pytest sktime/forecasting sktime/tests --co
I just checked construction times with the code below. It looks like construction is not the culprit, most likely. Most estimators construct instantaneously, there are some which need 10s of milliseconds which is probably too long (but does not explain collection). It could of course be that I am missing something in a soft dependency I do not have installed.
I think construction or soft dependencies do not affect the above two commands I shared. I am not 100% confident though.
I ran the first commands on main, python 3.11, windows - I cancelled it after it was running for 10 minutes.
That is very odd. Last time we measured test collection time, it was 15sec (minimal soft deps), or 1min (many extras)
update:
my show_versions
:
Python dependencies:
pip: 23.3.1
sktime: 0.28.1
sklearn: 1.3.2
skbase: 0.7.6
numpy: 1.26.2
scipy: 1.11.4
pandas: 2.1.4
matplotlib: 3.8.3
joblib: 1.3.2
numba: 0.59.1
statsmodels: 0.14.0
pmdarima: 2.0.4
statsforecast: None
tsfresh: 0.20.2
tslearn: None
torch: None
tensorflow: None
tensorflow_probability: None
what are your timings?
I also note how strange tihs is, because isn't 600s the stanard timeout for pytest collect?
I think we are getting closer - I updated the issue https://github.com/sktime/sktime/issues/6344 with the problem description. I will now run a profiler on the test collection which takes too long.
For example, the times before and after the change for forecasting with just test collection (
--co
flag) is 21.85 seconds (43669 tests) vs 314.01 seconds (79603 tests).
These are my times. But I wonder why's mine so much faster than yours!!
Idea: I don't have pytest-xdist or other plugins installed so disabled those in configuration (setup.cfg). Can you try that once?
I also note how strange tihs is, because isn't 600s the stanard timeout for pytest collect?
Yes, it's configured in setup.cfg. My guess is it is for test execution, not for collection. Don't know how to check that.
That is very odd. 20 sec are more in line with the times from this older issue: https://github.com/sktime/sktime/issues/4900
should we move the test times discussion to here? https://github.com/sktime/sktime/issues/6344
I think there is a bug in the new CI. The coverage is not affected since "old CI" is still running.
The issue occurs when, say, a forecaster changes.
The new CI will then trigger tests in
sktime.forecasting
, but not insktime.tests
which containsTestAllEstimator
. So, we run the forecaster specific tests for the changed forecaster, but not the object and estimator specific tests.FYI @yarnabrina, is there a way to fix this? The "one liner" to "run all tests for estimator X" is
check_estimator
- however, this is intentionally avoided in the original setup as an entry point since this distributes less well across workers, viapytest-xdist
.