sktime / sktime

A unified framework for machine learning with time series
https://www.sktime.net
BSD 3-Clause "New" or "Revised" License
7.92k stars 1.37k forks source link

[ENH] idea: testing environments by estimator #5719

Open fkiraly opened 10 months ago

fkiraly commented 10 months ago

An orthogonal idea for testing, FYI @yarnabrina:

If I write you python code that retrieves:

  1. all unique sets of dependencies, for individual estimators
  2. for each unique set in 1, the estimators giving rise to it

Would it be easy to set up CI that runs tests specific to these estimators? Say, if this is controllable via a pytest flag?

I think this is the only setup that truly scales with number of estimators going to infinity, because ultimately task specific modules will have the same problem of interacting dependency trees.

yarnabrina commented 10 months ago

Can you please explain a bit more details please? I want to understand it better first.

When I first read, I interpreted following happens.

  1. PR is created by user
  2. Detect modified modules using git
  3. Extract (possibly) modified estimators __all__ of these modules (are these always present?)
  4. Loop over estimators and modify the list if any has dependencies through inheritance
  5. Look over estimators and detect python verion and python dependencies from tags (do they always exist??)
  6. Store mapping of estimator name with python and soft dependency requirements (as JSON??)
  7. CI will loop over this dynamic output and create one job for each estimator supported python version for that estimator 3 operating systems

After a second read, I am not sure at all. Can you please share the steps you are planning (in python and in CI yaml)?

If possible, please tell me at what step my above understanding went wrong and it'll be easier for me to follow.

fkiraly commented 10 months ago

Yes, I think you got it right what I meat, except for step 7. Sorry for not explaining clearly.

The dynamic output should be:

Part 1: find all estimators that are affected by the change (affected, e.g., via inheritance etc)

Part 2: create enviroments ad run tests

In most cases, only one estimator is affected, and then it is run for the product of python version and OS, with the current primary satisfying environment, i.e., package versios installed satisfyig the estimator's requirements.

yarnabrina commented 9 months ago

2. Detect modified modules using git

Here's an idea to achieve this from a different discussion:

What I was thinking is very optimistic, and may have other problems. What I was thinking is to do this:

  1. start CI with a python 3.11+ job, which has tomllib.
  2. read current pyproject.toml and that from main.
  3. specifically compare which sets of dependency specifications vary, which will be available as dictionaries if I am not mistaken.
  4. identify mismatched specifications.
  5. identify names of packages from mismatches, and python requirements if any.
  6. use it to find affected estimators, and affected environments if any.
  7. trigger CI only for those environment-estimator combinations (related to #5719)

It's very different from the current PR I think, probably not worth considering. We can close this conversation.

_Originally posted by @yarnabrina in https://github.com/sktime/sktime/pull/5727#discussion_r1479231782_

This ideally should work with definite guarantee with correct parsin of only dedidated blocks, but with a slight chance of false positive is already addressed by @fkiraly in #5727 using git diff.