pyOpenSci / python-package-guide

scientific Python package recommendations & guidance curated by pyOpenSci
https://www.pyopensci.org/python-package-guide/
Other
93 stars 51 forks source link

Tests section of the guide (which isn't started yet!) #59

Open lwasser opened 1 year ago

lwasser commented 1 year ago

some very helpful discussion came up in a recent review here about tests and when to include them and what to do with data, etc.

When we work on our tests section we need to consider and weave this into that section.

cc'ing @NickleDave @aragilar on this - thank you both for this feedback!

ucodery commented 1 year ago

I did want to bring up the idea of when to package tests. The current python-package-structure recommends a top-level tests/ alongside the top-level src/ and then mentions how this will usually leave tests out of distributions. It goes on to be explicit about this

We do not recommend including tests as part of your package wheel by default

I completely agree with this point. However, tests in sdists never gets mentioned and I believe that an sdist should include all artifacts necessary for the consumer to execute all tests. [1]

That takes me to my other recommendation that I would like to make: to have tests runnable by consumers not only do the test directory and any configuration files (tox.ini) need to be available but so does any test data. The guide already has the recommendation

If you do include your tests in your package distribution, we strongly discourage you from including data in your test suite directory.

However, any data that is included should absolutely be packaged up and loaded as importlib resources

1) See https://discuss.python.org/t/should-sdists-include-docs-and-tests/14578 and https://discuss.python.org/t/should-building-wheels-from-sdists-be-recommended-behavior/8358 for discussions about things that should be recommended to be in sdists (and which of those don't go in wheels). IMHO

NickleDave commented 1 year ago

Generally agree with this, adding a couple of other points:

ocefpaf commented 1 year ago

I was going to make the same comment about pooch for larger datasets.

Another point is that I disagree with the no- tests policy on wheels. For pure Python packages? Sure. For wheels that ship compiled code? Please send me your tests b/c I want to be sure I can reproduce them on my platform!

CAM-Gerlach commented 1 year ago

The current python-package-structure recommends a top-level tests/ alongside the top-level src/ and then mentions how this will usually leave tests out of distributions.

It will leave tests out of built distributions (wheels, etc) but not source distributions (sdists). So long as that directory is included in the distribution package manifest (automatically so long as its in source control for most modern backends/plugins and not manually excluded, or manually otherwise), if it is outside the import package directory (src) it will be included in sdists but not wheels, which is generally the desired behavior that you describe.

However, tests in sdists never gets mentioned and I believe that an sdist should include all artifacts necessary for the consumer to execute all tests.

Yes, agreed (particularly as a Conda-Forge packager)—I initially had a different opinion, but my participation in the linked Python Discourse thread quickly changed my mind.

That takes me to my other recommendation that I would like to make: to have tests runnable by consumers not only do the test directory and any configuration files (tox.ini) need to be available but so does any test data.

In general, the modern workflow with most backends includes everything checked in by default, unless specifically excluded. Especially for a sdist, where size is much less critical, its better to just include data files for tests that are not too horribly large (perhaps over a ideally up to a few MB, but up to a few tens of MB total). If data is larger, it should be made as minimal as necessary to properly exercise the package's functionality, or if that is simply not possible, it could be downloaded externally using poodle, etc., adding a Pytest marker to those tests that require network access and not running them by default (so downstream distributors can decide whether it makes sense to run them).

However, any data that is included should absolutely be packaged up and loaded as importlib resources

Yup, though if its test-only data, you could potentially get away with making it (import) package resources in the test directory, though you might need to finagle your Pytest config to get it to work right.

lwasser commented 1 year ago

this is a great conversation! thank you all! here is the page / section that i think we are discussing -

Here we talk about the nuance of tests and data being included.

it sounds like a small pr is in order but some of hte above is already in the text. some issues i see are

  1. we aren't distinguishing between wheel vs sdist (what files belong in which)... altho i mention wheel i don't mention sdist explicitly
  2. it doest say don't include data as a heading ...

what language specifically do we want to finesse on this page? and then we need an entire page / section devoted to tests specifically as well. @NickleDave you might recall we had started to add a lot about tests here but we want a tests page devoted to that topic.

So what specifically shall we modify on this page to make it more clear / accurate and inline with what pypa and the python discourse recommends? FWIW all of the package maintainers for the core scientific python tools (that all have extensions / wrap other languages) wanted to ship tests for exactly the reason @ocefpaf mentioned above.

then let's work on a tests page separately that includes more about what goes in sdist vs wheels. let me know what y'all think.

ucodery commented 1 year ago

Another point is that I disagree with the no- tests policy on wheels. For pure Python packages? Sure. For wheels that ship compiled code? Please send me your tests b/c I want to be sure I can reproduce them on my platform!

I sort of get it, but why do you want to test bdists with extension modules, but not pure python bdists? If you are not confident that the developer is responsible to have tested their releases, wouldn't you test all packages? And if you want to test compiled code why not just grab the sdist? I suppose because you may not have the toolchain to compile locally but then the provided binaries might not be able to be tested anyways for any number of reasons; you may very well have to compile the module with different flags to properly test it.

It will leave tests out of built distributions (wheels, etc) but not source distributions (sdists).

Thanks, that clarifies things (and for that reason maybe should go into the guide). Is this true of every builder covered in the guide? I haven't done this sort of comparison exhaustively so it might very well be. I do still think it is an important point to bring up though as we don't want new maintainers to see a recommendation to leave tests out sometimes and then write exclusion rules that take them out for all distribution types.

this is a great conversation! thank you all! here is the page / section that i think we are discussing -

Yes, that is the page I have been alluding to. Although the final work may be broad enough to need to update a few pages.

we aren't distinguishing between wheel vs sdist (what files belong in which)... altho i mention wheel i don't mention sdist explicitly

Exactly. Test distribution is already covered but IMO since only wheel are explicitly mentioned when the guide switches to "package distribution" it's not clear that a separate recommendation is being made for sdists. Also, I don't think the size or type of test should matter - if you decide to distribute tests it is nice if you distribute all of them. Ideally complex/ flaky/ non-isolated tests are marked in some way that the consumer can disable or enable what they want as @CAM-Gerlach mentioned.

CAM-Gerlach commented 1 year ago

I sort of get it, but why do you want to test bdists with extension modules, but not pure python bdists?

Sidenote, but the standard CI testing approach I generally do and recommend as an upstream developer, pure Python or not, and which aligns with best practices as far as I'm aware, is in your CI test pipeline (or using a workflow tool like Tox/Nox/etc,), is to build a wheel for your project (in turn built from a sdist, as python -m build does by default), install that in a fresh env (pip install dist/path-to-wheel.whl), and then run the tests in the local test directory you already have (having checked out the project in order to build) against the installed project, so you don't actually need the tests in the wheel itself but are fully testing it.

By contrast, as a downstream redistributor you're going to be rebuilding the project from source anyway, and will have the sdist or the source tree, so you don't need the tests in the wheel either.

I could envision certain scenarios where it could be useful, but it seems a bit of an edge case vs. having tests in the sdist.

Is this true of every builder covered in the guide? I haven't done this sort of comparison exhaustively so it might very well be.

Assuming you've configured the backend to add the tests (or it does so automatically via VCS), it is a nessesary consequence of how the sdist vs. wheel formats work; sdists contain the entire source tree minus anything excluded, while aside from explicitly specified license files and metadata, wheels contain only the actual import package(s) themselves, so anything outside the package directory is excluded (unless added via the legacy data-files mechanism, or added under one of the other .data keys).

ucodery commented 1 year ago

The changes that I would propose right now:

CAM-Gerlach commented 1 year ago

this means the pyproject.toml at least

Just to note, I meant to mention this before, this is explicitly required by the sdist spec and implemented by all conforming backends, so there's no need to mention this here as it could confuse users or make them think they have to explicitly specify it (as opposed to the other things they should include that may or may not be automatically included depending on their backend).

the sdist should generally include all files that are revision controlled, except for the revisioning data itself (.git, .gitignore...)

I wouldn't even go so far as to suggest excluding .gitignore or .gitattributes, as the former is used by some backends to determine the files to include in the built package, and the data from the latter on how to treat file (as well as other things) can sometimes be used by other tooling. Given their relatively trivial size (and the fact that they won't be included in the binary artifacts that incurs the vast majority of the downloads, since they are outside the import package), it's IMO not even worth excluding them.

ucodery commented 1 year ago

this is explicitly required by the sdist spec

Good to know! I didn't realize this preference made it into any formal specification. It should still be noted though that pyproject.toml is not the end-all-be-all of build files because of the possible dynamic attributes. Although we should not promote FUD in readers, there are compliant sdists on pypi that are nonetheless incapable of building wheels.

I wouldn't even go so far as to suggest excluding .gitignore

Sure, I am comfortable only recommending excluding .git or equivalent. Authors who know what they are doing will likely do what they want to some degree anyways.

ocefpaf commented 1 year ago

I sort of get it, but why do you want to test bdists with extension modules, but not pure python bdists? If you are not confident that the developer is responsible to have tested their releases, wouldn't you test all packages? And if you want to test compiled code why not just grab the sdist? I suppose because you may not have the toolchain to compile locally but then the provided binaries might not be able to be tested anyways for any number of reasons; you may very well have to compile the module with different flags to properly test it.

It is not about trust but the fact that a passing test for a binary built under certain circumstances may fail with the running platform is slightly different. That would be super rare and unexpected for a pure Python package.

CAM-Gerlach commented 1 year ago

Good to know! I didn't realize this preference made it into any formal specification.

Actually, it's not a mere "preference", but is rather the defining feature of the modern sdist format :slightly_smiling_face: , i.e. a .tar.gz containing a pyproject.toml that tells frontends what to do with it, plus other arbitrary files. A sdist without a pyproject.toml is not a modern, standardized sdist but rather an old-style, unstandardized "legacy sdist", and will only be produced when the original source tree is lacking such, resulting in backends assuming legacy, setuptools-based default values for the build-system table.

It should still be noted though that pyproject.toml is not the end-all-be-all of build files because of the possible dynamic attributes.

Sure, though it is the responsibility of the backend to ensure the sdist contains any backend-specific files (e.g. setup.cfg, setup.py, flit.ini, meson/CMake config files etc) used by that backend, and the standard, recommended python -m build (and presumably most other modern frontends) first builds a sdist and then builds the wheels from it, so such problems would be immediately detected (of course, the sdist could still be unbuildable on end user machines for other reasons, e.g. missing external dependencies).

Sure, I am comfortable only recommending excluding .git or equivalent.

Right, though for the sake of pedantry, excluding VCS directories will generally be handled automatically by the backend:

ucodery commented 1 year ago

I believe I've taken us somewhat off-topic. We are not building a backend here that has to make some of these decisions, only making recommendations of current tools. I guess we need to know if all the modern build toolchains already do everything I before proposed as a change to the recommendations (including setuptools - likely using an extension).

If they do then we don't really need to call all this out for readers and what I would probably suggest is to drop the mention of leaving tests out since that's just the normal. If some don't we should probably have examples of how to get this desired behavior out of them.

The other question I have is if we want to make the recommendation that wheels do include tests (some level of smoke checks I assume?). My vote is very much -1 but it appears @ocefpaf and @lwasser and maybe others believe this is a good practice when compiled code is involved.

ocefpaf commented 1 year ago

My vote is very much -1 but it appears @ocefpaf and @lwasser and maybe others believe this is a good practice when compiled code is involved.

I guess that the reason to keep tests for compiled code is not well explained here. The thing is, when a scientist installs numpy, for example, from PyPI, the wheel was built under certain circumstances and the tests for those circumstances passed. However, the local machine is slightly different and the devil is in the details, most of the tests can pass but some may present a precision difference that may lead to different results and/or test failures.

Someone may argue that one should always build their own library when compiled code is involved but I would argue that installing and running the tests (and hope everything passes) is easier than start off by building your own.

The topic of not shipping tests is a light pkg vs completeness most of the time but, when compiled code is involved, there a real danger of different results.

CAM-Gerlach commented 1 year ago

If they do then we don't really need to call all this out for readers and what I would probably suggest is to drop the mention of leaving tests out since that's just the normal. If some don't we should probably have examples of how to get this desired behavior out of them.

Sorry for the confusion—my fault, I was focusing too much on pedantic details. I just meant that there's no need to mention to users that they have to include pyproject.toml or backend-specific files as the backend is required to handle that if they are present, and also that arguably there was no need to tell them to explicitly exclude VCS-related files as all current backends handle that by default too.

However, as for tests, I'm +1 on including a mention that they should be included in the sdists if practical, and +0.5 on ensuring they are excluded from the wheels if practical (with the former taking priority, which will happen by default if they are outside the package per the recommended layout).

The topic of not shipping tests is a light pkg vs completeness most of the time but, when compiled code is involved, there a real danger of different results.

Couldn't the small proportion of wheel users who want to run the tests just also download the sdist of the package, the source archive of the GIt tag or the repo at that tag, and run the tests again the installed wheel from there? Given the overwhelming majority of wheel users will never run the tests, ISTM that the minor amount of additional inconvenience this requires (if any at all, if they already have the repo cloned somewhere) seems to outweigh the additional bandwidth and install time costs for most cases. Or is there something I'm missing here?

ocefpaf commented 1 year ago

Given the overwhelming majority of wheel users will never run the tests

I'm from a time were the install instructions were always: install, run the tests, use it, but I'm old. Still, at least in my bubble, we are very worried about reproducibility and accurate results across experiments in various platforms, so we always run the tests.

One could flip this argument around and say that the dangers of having wrong results are not worth the little extra bandwidth necessary for shipping the tests? I believe scipy and numpy libs are unlikely to stop shipping the tests and removing a direct call like import numpy; numpy.tests().

Note that this is my last comment on this thread b/c I don't feel strongly were this documents goes. Just wanted to point out what part of the scientific community thinks about this issue.

PS: maybe it is worth asking numpy and scipy devs what they think.

CAM-Gerlach commented 1 year ago

Hey @ocefpaf , sorry if I got us too sidetracked into the relative proportions of wheel users who do vs. don't run the tests, and the relative merits thereof. I can certainly see the argument of running the tests against the installed wheel, and in fact that's what I (and many others) do for the projects I maintain, and recommend that others do the same.

My main question was whether it would not simply work for your use case to run the tests themselves from an unpacked sdist, source archive or repo checkout but still against the installed wheel as you want, in cases where tests are not included with the wheel itself (i.e. if the tests directory is outside the import package(s))? It seems it should work, since this is how the tests should be run upstream anyway with this structure (and presumably are, if following these same best practices).

Conversely, if the tests are included in the import package, then they will presumably still be in both the wheel and the sdist, given most backends don't really offer an easy mechanism to include things in one but not the other for things inside the import package itself (you technically can with Setuptools, but I certainly wouldn't call it easy).

So it seems to me that either way, you can still run the tests against the installed wheel, even if the tests themselves are outside the package, no? Or maybe I am missing something here?

ucodery commented 1 year ago

As an attempt to put a pin in this here is my updated proposal:

Also, I want to make sure that sdists include all files necessary to build wheels. I have run into real difficulties with scientific packages in the past that do not do this. Almost always it is because they are generating files during the build process that are not in revision control (setup.py creating the README dynamically...) or that the build system is reading files that are explicitly excluded (MANIFEST.in skips a file that is ready by setup.py). However, I don't think that new projects using new build systems will produce these same quirks, so I am reluctant to confuse our readers with this caution. WDYT?

ucodery commented 1 year ago

I also wonder looking back at the page if the hierarchy could be reconsidered Currently:

This would make a lot more sense to me:

lwasser commented 1 year ago

i hear you! i think the only reason we did it that way was to highlight we suggest src/ layout to avoid some of the confusion between choosing but also that data piece does really fit in another section.

im thinking we should work in a google document where we can have lots of comments and input prior to a pr and we could totally reorg this section. we probably want to create an outline for a large chunk of the guide and then start writing.

maybe we can even have some meetings that are working focused soley on writing?