Testing against downstream libraries

martinfleis commented 1 year ago

I was recently chatting with some folks developing R packages and learned that when they want to make a release available on CRAN, the package is installed in the environment and the test suites of all downstream packages are ran to ensure that the new release does not come with an unexpected breaking change. While I think this is a bit too much (and it does indeed cause some friction and negativity) I'd like to explore the options how to do the same with Python.

I can imagine we have nightly runs that install a defined set of downstream packages (not one or two, but more twenty or forty) and runs their tests against the current main. While this is already technically feasible, I am not aware of anyone doing it comprehensively and it currently faces the issue of inconsistency of build and CI systems.

We could clone each repo and build from source, but then we need to know how each of the downstream packages get built and prepare for that. Which is non-trivial for a lot of spatial stuff depending on GDAL and friends. However, I don't think we need to test against main but testing against the latest release should be enough. But that comes with the other issue, that if we install the packages from PyPI, the tests are sometimes not a part of the distribution and in other cases will not run, because some data they use are available only in the repo itself.

I don't have a clear idea how to tackle this but I'd love to spend some time thinking about tooling that may enable it. It happened numerous times that we found a regression because a CI of a downstream package failed. This way, we would figure it ourselves.

ivirshup commented 1 year ago

This is something we've been talking about with scverse, especially since we are largely in the lineage of bioconductor and the R ecosystem.

An alternative I think we might be happy with is getting downstream packages to have a CI job that tests against pre-releases (e.g. pip install --pre). So long as the central packages are good about making pre-releases, actively maintained downstream packages should hit incompatibilities before a general release is made.

As we're also creating a registry of downstream packages in our ecosystem, we could be able to inspect dependent packages recent CI runs (assuming they are using a workflow we provide on github actions) to get some visibility into if errors are being encountered due to our releases.

pllim commented 1 year ago

Astropy is trying to use https://github.com/astropy/astropy-integration-testing and we're about to in a few weeks for real. 🤞

stefanv commented 1 year ago

Also see https://github.com/scientific-python/summit-2023/issues/3 for cross-project testing.

martinfleis commented 1 year ago

Thinking about this in more detail, I think it has three parts.

Ensure that we are packaging tests se they are runnable from the installed package using pytest --pyargs package_name. In some cases, packages (like geopandas for example) use additional datasets for testing that are not shipped with the package. In the optimal situation, all necessary components are part of the package. However, that is not always possible and we may want to figure out some pytest mark that can be used to filter tests that are not expected to pass in this situation. That may take a form of some SPEC? Or become a part of a related one?
Reverse dependency check to determine downstream packages that we should be testing against. This can be either one off that allows us to manually define packages to test against or fully automatised (like CRAN has). This is imho tricky as both conda-tree and pipdeptree give you this information based on your installed environment. We would ideally need to fetch such information from conda-forge directly. Alternatively, we can use Github's dependency tree but that contains a lot of noise and I'm not sure how to filter real packages out of that. Maybe by cross checking if they exist on PyPI/conda-forge? This is a part I am not entirely sure how to do right now. We can potentially start with a manual definition though.
It would be nice to wrap that all in a convenient Github Action so people can just add it to their workflows, specify a couple of options and that is all.

Asking downstream packages to test against nightly wheels or main branch is also good to have but it moves the responsibility to those downstream packages. Given some larger packages may have more (or at least some) funding, it may make sense to give a helping hand and test against downstream packages from the upstream one. So imho these two approaches are complimentary.

martinfleis commented 1 year ago

Got a solution for conda-forge based reverse dependency check

mamba repoquery whoneeds -c conda-forge <package>

It returns all versions of downstream packages but that is easy to filter.

martinfleis commented 1 year ago

This took a shape of https://github.com/scientific-python/reverse-dependency-testing.

scientific-python / summit-2023

Testing against downstream libraries #16