Closed averater closed 4 years ago
unfortunately the fixture system is completely unaware of such conflicts and so far none has set out to come up with a comprehensible wa of handling "uniqe" resource configurations and their dependency/topimization
I'm still a little unclear as to how everything ties together on your end, but I'll present a solution, and you can tell me if I'm missing something.
It sounds like you have several tests that are setup pretty much the same way, except the config is different. It also sounds like you have some tests that should only apply for some configs, but not others. It also looks like your example fixtures are showing that you have 2 files, and they will either have a 1 or a two written to them to show that they would conflict as one fixture could overwrite the work of another. You also mentioned that it takes a long time to prepare one of these configs, but that will be handled as a pleasant side effect of the solution I'm presenting.
I'm a bit of a testing purist, so before I dive in, I want to address this:
I would also like this to work through an entire session so we can run all tests in our repos with one config and then change config. Then run next config etc.
The difference between one test and another that do the same setup and take the same measurements (i.e. assertions) is the independent variables. If you have different independent variables, you have different tests. Differences in configs (in this case at least) constitute different independent variables, which means they are different tests because you'd be testing something different.
When you treat them as different tests in the eyes of pytest, things get easier. This means each test is already defined and given a separate nodeid, regardless of which config is active. Then, when you only want to run certain tests, because you only want to use certain configs at that moment, you can then just "select" those tests using this technique.
With that in mind, on to the solution.
If you have a common setup with only some variables being changed, and all the tests are the same, then fixture parameterization is the answer. Simply parameterize the right fixture, and everything extends from there. Pytest will automatically handle one param set at a time, so if that parameterized fixture has the right scope, it will only run the minimum amount of times that it needs to, and it will finish one param set before moving on to the next.
In your case, it sounds like that would mean something like parameterizing a session-scoped fixture with the different config starting values (i.e. something that you can pass in to some function/class to kick off the config generation easily). Some abstraction may be required for you to more easily set this up.
The tricky bit now, is that you may have certain tests that shouldn't be running against certain config params. This could lead you to using runtime logic to skip
tests, but I consider this a bad practice, and there's a much better way, with minimal effort required. All you need to do is leverage multiple conftest.py
files, where the parameterized fixture is overridden by another definition of it using the param set that apply to all the tests in the directory of that conftest.py
.
Going off of your example, let's say you have your top level conftest.py
in tests/
(giving you tests/conftest.py
). In there, you have the following fixture:
config_permutations= [
(Config("file1", "1"),),
(Config("file1", 2"),),
(Config("file2", "1"),),
(Config("file2", 2"),),
(Config("file1", "1"), Config("file2","1")),
(Config("file1", "1"), Config("file2","2")),
(Config("file1", "2"), Config("file2","1")),
(Config("file1", "2"), Config("file2","2")),
]
@pytest.fixture(scope="session", params=config_permutations)
def write_configs(request):
for config in request.param:
config.write_to_disk()
This will parameterize every test inside tests/
that write_configs
runs for. You can also just make it autouse
to strip down some off the boilerplate fixture requests.
Unfortunately, this includes the tests that shouldn't run for some of those param sets. But that's just because we aren't done yet.
Now let's say you have some tests that should only run for configs (Config("file1", "1"), Config("file2","1"))
and (Config("file1", "1"), Config("file2","2"))
. Then you can make the folder tests/configs_1121_1122/
(there's probably a more meaningful name you can give it in your actual use case though), and put all those tests in there. But in addition to this, you can define another conftest.py
in that folder (giving you tests/configs_1121_1122/conftest.py
) with the following fixture defined in there:
config_permutations= [
(Config("file1", "1"), Config("file2","1")),
(Config("file1", "1"), Config("file2","2")),
]
@pytest.fixture(scope="session", params=config_permutations)
def write_configs(request):
for config in request.param:
config.write_to_disk()
Now this write_configs
will override the one in tests/conftest.py
so the tests in tests/configs_1121_1122/
will only run with those 2 configs.
You can repeat this approach for all the other tests that can't run under all config variations.
If you're using pytest-xdist
(which it sounds like you're not, but can't hurt to cover this briefly), then you can group the tests into workers where each worker is handling only a single config using the approach here (because the param ID will be in each test's nodeid) https://github.com/pytest-dev/pytest-xdist/issues/18#issuecomment-392558907. This is made possible because each test is given an actual nodeid, rather than relying on passing in config details at launch to determine what the tests do.
The only issue, is that the alternate fixture versions will probably run multiple times, even if only one config is being "selected". By that I mean (Config("file1", "1"), Config("file2","1"))
will be used once for the write_configs
defined in tests/conftest.py
for all the relevant tests, and then it would be used again for the write_configs
in tests/configs_1121_1122/conftest.py
for all the tests in the tests/configs/1121_1122/
folder.
This can definitely be solved, and there's more than one solution for this part. However, I'll let you read through and see if you can find the answer (maybe something with singletons 🤔 ). If you're not sure, let me know and I can give some ideas.
Thank you very much for your replies.
@SalmonMode I made a oversimplified example. Our test configuration is mostly based on one config file with ~1000 bytes, each having 256 different possibilities. But there are more ways to configure the system also. So any sorting based on config is not doable. And any permutation of possible configs will be too big for any test system. Also I would like to have a more general solution where tests are placed in folders based on what function they are testing and not based on what config they require.
One of the beautis with pytest is that we can write requirements just next to the test with fixtures or marks. We would really like to use that. I have looked into the source code into some options but have not came up with a good solution.
This works but will not minimize the number of options. So option1 will in the example below be called three times instead of two. Also I have to write "option1" three times for each test.
@pytest.fixture
def option1(request):
set_option_1(request.param)
@pytest.mark.parametrize('option1', [1], indirect='option1')
def test_something1(option1):
pass
@pytest.mark.parametrize('option1', [2], indirect='option1')
def test_something2(option1):
pass
@pytest.mark.parametrize('option1', [1], indirect='option1')
def test_something3(option1):
pass
Something like this is what I would like. But it would require a larger change to pytest so it is probably not doable soon (unless someone else more into pytest can/will help).
@pytest.resource
def option1(param):
set_option_1(param)
@pytest.mark.option1(1)
def test_something1():
pass
I wouldn't worry about pytest handling all the possible config permutations. If you are worried, it might be worth a look to see if your testing approach is apt, as there may be tests that either don't need to be run at all, or they could be structured far more efficiently.
Rest assured, though, as there's almost always a pure pytest solution, and I'm happy to help find it.
After reading through your original post again, it also sounds like the hardware only has one config "file", so to speak, rather than multiple. This makes things quite a bit simpler. But correct me if I'm wrong.
Ultimately, you have a large number of configurations, and then you have a large amount of different hardware. Some pieces of hardware are incompatible with some configurations.
The part that's confusing me is where you said:
That has the downside that tests are run multiple times as many combinations of configurations are applicable for some of our tests leading to longer test times.
I originally thought you wanted to run all tests on all hardware, under every config that each hardware is compatible with. I'm starting to think that isn't the case.
Are you looking to only run each test function once, but certain tests can only be run on certain hardware with certain configurations?
Something like this is what I would like. But it would require a larger change to pytest so it is probably not doable soon (unless someone else more into pytest can/will help).
You can access marker values from fixtures using get_closest_marker
- a somewhat similar example I wrote recently:
@pytest.fixture
def parsed_args(request):
marker = request.node.get_closest_marker("cli_args")
args = marker.args
if args is None:
args = ... # some suitable default for unmarked tests
# could also raise an error instead?
return mod.parse_args(args)
@pytest.mark.args(['--verbose', 'filename'])
def test_verbose(parsed_args):
mod.run(parsed_args)
That shouldn't be necessary.
Aside from the built-in pytest.mark
stuff (e.g. pytest.mark.parameterize
), I find marks are best for providing meta data to filter on after the fact with pytest -m
, rather than trying to use them to alter what was collected to begin with or how they operate. But that filtering mechanic is exactly what I think can solve this problem (or at least a big part of it).
If the answer to my question above is "yes", then I think the most straightforward route would to define each test for each applicable hardware configuration. Using indirect
with pytest.mark.parameterize
does mean the fixture you're indirectly parameterizing will be executed again, but that doesn't necessarily mean the device will have to restart or be reconfigured, which is why I mentioned singletons earlier (as a hint 😉 ).
If the fixtures for the configs are parameterized, you can give the params ids
(or just rely on their __repr__
/__str__
, I forget which at the moment), which can be filtered on with pytest -k
, or you can use the pytest_collection_modifyitems
hook to mark them based on the config param used so you can filter with pytest -m
.
With that in place, you can simply target specific hardware configurations for a single test runin order to make sure they all happen together and the device is only configured/restarted once per config.
I understand the concern about having to do @pytest.mark.parametrize('option1', [1], indirect='option1')
in a bunch of places, but there's better ways to structure it so you don't have to do that as often, and I would also do @pytest.mark.parametrize('option1', [1], indirect=True)
instead. You would also only need to do it for tests or groupings of tests that are the exception and can't run on all configurations, as the rest can just have tests defined for all hardware configurations.
I also highly recommend against manipulating the way pytest collects tests, as every time I've seen that done, it's only resulted in a more complex and difficult to maintain system.
Note: Filter tests the way I described happens after pytest has collected all the tests. They are separate phases, and filtering is a primary feature of pytest, so it is automatically accommodated for in all aspects of pytest, so long as it's done through the standard means. Modifying how pytest collects tests isn't accommodated for in all aspects of pytest, so while it's technically supported, the burden of maintaining that is placed on the user, and it's expected that they know what they're doing and are aware of all its implications. This is part of why I avoid modifying how it collects tests.
Closing this as an inactive question.
why not have an option to group tests that require incompatible fixtures apart? maybe have a scope "tests that use this fixture"
or something. it would be nice to be able to write something like...
from contextlib import contextmanager
import pytest
dogs = set()
@contextmanager
def expensive_operation_with_dog_named(name):
dogs.add(name)
yield
dogs.remove(name)
@pytest.fixture(scope="tests that use this fixture")
def charlie():
with expensive_operation_with_dog_named("charlie"):
yield
@pytest.fixture(scope="tests that use this fixture")
def buddy():
with expensive_operation_with_dog_named("buddy"):
yield
def test_charlie(charlie):
assert dogs == {"charlie"}
def test_buddy(buddy):
assert dogs == {"buddy"}
def test_charlie_again(charlie):
assert dogs == {"charlie"}
in my case expensive_operation_with_dog_named
is actually starting a rather expensive service that i can't run in 2 instances
@oakkitten you can do this by leveraging scope already. You can use a sub-package or just a module to house the tests that need that expensive fixture, and have the fixture set to that scope (e.g. package if it's for the package, or module if you're using a module).
A fixture will only run once a test that requests it is starting to execute, unless it has autouse set to true. So in this case, it looks like expensive_operation_with_dog_named
would run 3 times. But you can have it run only once for Charlie (even though there's 2 tests that use it) and once for buddy by using a larger scope on the way I mentioned.
i can, but it's rather inconvenient. it's perhaps ok if the kind of dog is one of the most important quality of a test, but what if it's rather insignificant? you are now grouping tests by some insignificant quality of them.
and what if you have a test that needs not only a dog but also a cat, which is also expensive? i suppose you'd have to have nested test structures now and it's just getting more complicated.
grouping test by incompatible fixtures automatically, rather than manually, seems like a cleaner solution and not too difficult to implement... why not?
edit: here's an awkward idea that at least works
from contextlib import contextmanager
import pytest
dogs = set()
@contextmanager
def expensive_operation_with_dog_named(name):
dogs.add(name)
yield
dogs.remove(name)
@pytest.fixture(scope="session", params=["charlie", "buddy"])
def dog(request):
with expensive_operation_with_dog_named(request.param):
yield
@pytest.fixture()
def charlie(dog):
if dogs != {"charlie"}:
pytest.skip()
@pytest.fixture()
def buddy(dog):
if dogs != {"buddy"}:
pytest.skip()
def test_charlie(charlie):
assert dogs == {"charlie"}
def test_buddy(buddy):
assert dogs == {"buddy"}
def test_charlie_again(charlie):
assert dogs == {"charlie"}
Ah, I think I understand now.
Parameterizing is great for when you want to make sure different contexts/input data result in the same behavior and output/state. For example, parameterizing browsers for e2e testing, as one would still expect identical behavior in different browser (from the perspective of the test, at least).
But if you do it only because it let's you repeat yourself less (DRY), and are triggering different behaviors because of different inputs (e.g. Charlie vs buddy), then it's less clean. It violates DAMP and KISS. Based on the fundamentals of how the framework operates, it either implies test cases that don't actually exist, and dependencies that shouldn't be, or it requires more complex logic (complex meaning more diverse behavior in a smaller amount of code), and complexity is the enemy, or you manage to have both problems.
The cleanest possible solution, is having two fixtures. The first example you gave is actually quite possible, and very clean. Assuming that first snippet you provided was the entire test file, you can just make the scope for Charlie and buddy be "module", and you're done.
The ideal is grouping tests by the behavior under test, which is why we use fixtures to control that grouping, as a fixture often represents the triggering of a particular behavior, no matter how insignificant.
Also, if you have a test that needs both cat and dog, then you can have the relevant cat fixtures depend on the relevant dog fixtures, so you can capitalize on the dog fixture having already been run. I like to think of it like a big XML document, so you'd have your cat tag inside the dog tag. If you have some tests that only needed the cat stuff, but not the dog stuff, you can still have them be inside the dog stuff, but then if you run them in isolation, they'll still be executing the dog stuff when they don't need to. And of course, the tests that only need the cat stuff would still be running with unnecessary dependencies when running the full test suite, which introduces confounding variables.
There is no optimal solution where a test framework can be aware of these kinds of dependencies and switch them on and off based on when a test happens to be executing, because then your tests, by definition, won't be idiomatic or deterministic.
Edit: instead of "deterministic", the correct word to use is "linearizable" (for anyone reading in the future)
Parameterizing is great for when you want to make sure different contexts/input data result in the same behavior and output/state. For example, parameterizing browsers for e2e testing, as one would still expect identical behavior in different browser (from the perspective of the test, at least).
i only used parametrizing to demonstrate that this problem can be easily solved. it's the wrong tool to use, but the end result is that the expensive operation is run the minimal number of times and the structure of the tests does not depend on pytest internals. all without changing pytest code!
The first example you gave is actually quite possible, and very clean. Assuming that first snippet you provided was the entire test file, you can just make the scope for Charlie and buddy be "module", and you're done.
if you do that, all tests but the first one fails
def test_buddy(buddy):
> assert dogs == {"buddy"}
E AssertionError: assert {'buddy', 'charlie'} == {'buddy'}
def test_charlie_again(charlie):
> assert dogs == {"charlie"}
E AssertionError: assert {'buddy', 'charlie'} == {'charlie'}
did you have something different in mind?
Also, if you have a test that needs both cat and dog, then you can have the relevant cat fixtures depend on the relevant dog fixtures
if cat is unrelated to dog, this would lead to some very awkward code. there wouldn't be only "tom" now, but "tom and buddy". for buddy, charlie and no dog that's 3 fixtures instead of one?
There is no optimal solution where a test framework can be aware of these kinds of dependencies and switch them on and off based on when a test happens to be executing, because then your tests, by definition, won't be idiomatic or deterministic.
here's a working example with cats and dogs, i just copy-pasted half the code:
from contextlib import contextmanager
import pytest
dogs = set()
@contextmanager
def expensive_operation_with_dog_named(name):
dogs.add(name)
yield
dogs.remove(name)
@pytest.fixture(scope="session", params=["charlie", "buddy"])
def dog(request):
with expensive_operation_with_dog_named(request.param):
yield
@pytest.fixture()
def charlie(dog):
if dogs != {"charlie"}:
pytest.skip()
@pytest.fixture()
def buddy(dog):
if dogs != {"buddy"}:
pytest.skip()
cats = set()
@contextmanager
def expensive_operation_with_cat_named(name):
cats.add(name)
yield
cats.remove(name)
@pytest.fixture(scope="session", params=["tom", "simba"])
def cat(request):
with expensive_operation_with_cat_named(request.param):
yield
@pytest.fixture()
def tom(cat):
if cats != {"tom"}:
pytest.skip()
@pytest.fixture()
def simba(cat):
if cats != {"simba"}:
pytest.skip()
def test_charlie_tom(charlie, tom):
assert dogs == {"charlie"}
assert cats == {"tom"}
def test_charlie_simba(charlie, simba):
assert dogs == {"charlie"}
assert cats == {"simba"}
def test_buddy_tom(buddy, tom):
assert dogs == {"buddy"}
assert cats == {"tom"}
def test_buddy_simba(buddy, simba):
assert dogs == {"buddy"}
assert cats == {"simba"}
this seems very deterministic and as ideomatic as it can get? a test simply depends on a cat and a dog and you can put tests in any order and cats and dogs in any order. pytest is even smart enough to run the expensive operation the total of 5 times!
(it would be nice to have a say about which fixture is more expensive, a cat or a dog, so that pytest could minimize the number of times it's run. in this example, pytest runs, in order, charlie, tom, buddy, simba, charlie. reordering stuff gives different order, but pytest seems to be bent on making more dog than cats. apparently, the order depends on the name of parametrized fixture. it seems that the name that's closer to 'a'
gets priority, so as cats come before dogs, cat fixture runs less. renaming cat to xat flips this around. so it's already possible with pytest!)
Ah I see why changing the scope didn't fix it.
Unfortunately, your solution isn't deterministic, because it's dependent on test execution order, in that the changes to the system state are not the same for a given test if that test is run in isolation versus the entire suite being run.
You should never rely on test execution order, because it means that you're likely introducing unnecessary dependencies and confounding variables, aren't engineering a test around a given behavior effectively, and lose the ability to run your tests deterministically. Pytest also doesn't have a set way of ordering tests, and how it happens to order them is up to the current implementation in certain areas of the code.
Even if you pinned pytest to a specific version and identified a (currently) deterministic sorting order for the tests, you'd still be beholden to the order you define your tests in and how they're named, which will very quickly become a nightmare to maintain and extend, and also wouldn't be idiomatic, nor would it be Pythonic.
Attempting to "repair" state after a given test is tempting, but unfortunately means you're still manipulating the state in various ways differently. It also means you're adding complexity and assuming that you will be perfect at undoing what was done, which is almost never the case.
For example, if using Selenium and you want to share the same browser between two different tests to save time, you might attempt to wipe the history, cookies, and cache, but that doesn't work as browsers are incredibly complex, and there will always be something left behind. That's why it's recommended to always make a fresh driver session for each test.
I understand the desire to speed up very slow tests, but there's always a better way. Can you provide some more context on what dog
/cat
are and what the expensive operation is? I'm sure I can help come up with a more effective structure.
Edit: instead of "deterministic", the correct word to use is "linearizable" (for anyone reading in the future)
it's dependent on test execution order
but it's not? my very idea is to not rely on test structure (or order) to get the same results. this working example (again, this is only to demonstrate that pytest can easily do it) demonstrates that you can have tests in any order and have test arguments in any order, really you can have anything in any order here and it will work the same way any time.
and how they're named
now this is an unrelated problem that already exists in pytest. how tests are run is already depending on the names of the fixtures. it would be best to create another issue for this, though.
Attempting to "repair" state after a given test is tempting
i'm assuming you are talking about dogs.remove(name)
here. in my case i'm starting and stopping a rather expansive service (an irc network and accompanying services) which i can't easily run several instances of.
it's dependent on test execution order
but it's not?
The tests are dependent on test execution order in that they could only be considered deterministic based on test execution order being deterministic. Determinism is dependent on both behavior and outcome. 0 + 3 - 3 + 2
may have the same result as 0 + 2
, but it's not the same behavior.
If I run your tests as a suite, it's sort of like doing 0 + 1 - 1 + 3 - 3 + 2
for the setup of a given test. If I run that test in isolation though, only 0 + 2
would happen.
That said, given this is more to be representative of an IRC network starting/stopping, then it's a different story. But I have to ask why not have a package level fixture in a conftest that launches the IRC network for the tests that depend on it, and have all the tests that depend on it be the only ones in that package?
and how they're named
now this is an unrelated problem that already exists in pytest. how tests are run is already depending on the names of the fixtures. it would be best to create another issue for this, though.
Not quite. It's not actually a problem.
Pytest doesn't actually care about order, or naming, and never will. I only mentioned it because, as I said, the determinism of your tests is dependent on test execution order.
The only thing you can guarantee related to test order, is test batching. Tests within a given scope will always be run together (e.g. all the tests in a class will be run together, but in an effectively random order). The execution ordering within that group is irrelevant.
Order of tests should always be considered nondeterministic, but not the order of fixtures. Fixture order is determined by scope, which fixtures request which other fixtures, and whether or not a fixture is autouse.
Scope can be leveraged to ensure, not an ordering of tests, but a common context under which a group of tests can run. In other words, it lets you perform your "arrange" steps, and then the action, and then run multiple asserts against that resulting state without having to repeat the steps. The order of the asserts shouldn't matter, because the asserts should be using non-state-changing queries only.
They can also be used to ensure that immutable resources (or effectively immutable resources, i.e. those that could be mutated, but aren't) are only created/calculated once for the entire scope, which can save a lot of time. For example, starting an IRC network for several tests to leverage.
I'm guessing, but I think these last two bits are the secret ingredient you need. Granted, I'm only going off the context that some tests require your IRC network, while others don't.
i'm not sure i understand the general idea. you definitely can make tests that depend on the order of them, and it's definitely a bad idea, isn't it? e.g.
foo = set()
def test_a():
assert foo == set()
foo.add("a")
def test_b():
assert foo == set()
here the tests are executed in the order that they are defined and the order matters. in the examples that i posted the tests are executed in the order that pytest defines for parametrized fixtures and the definition order doesn't matter... this is the default and at least partially documented behavior of pytest, isn't it? if there's a problem with this behavior, well, it's already affecting the existing tests, doesn't it?
you definitely can make tests that depend on the order of them, and it's definitely a bad idea, isn't it?
Yes, absolutely. But what I'm saying is that, while it's not your intent to make the tests dependent on test execution order, they still are because that's the only way they can be considered deterministic.
In order for them to be deterministic, a latter test has to always have a setup of 0 + 3 - 3 + 1 - 1 + 2
(as is the case when run as part of a suite), or 0 + 2
(as is the case when run in isolation), but not both, even though the only part it cares about is <last_value> + 2
.
In order for a test to be considered deterministic, it has to have the exact same setup when run as part of a test suite as it does when run in isolation.
Since you were saying that the set being updated was just a placeholder for a network starting and stopping, this doesn't apply, as the state of the running system would not have been modified from one test to another. Had this been an actual object, then it would have applied.
in the examples that i posted the tests are executed in the order that pytest defines for parametrized fixtures and the definition order doesn't matter... this is the default and at least partially documented behavior of pytest, isn't it?
It is not. At least, not exactly. Pytest really only cares about grouping, not order .
As you parameterize a fixture you create a set of bubbles, where each bubble is one param set. Everything that was affected by that parameterization effectively gets a copy made and inserted into each bubble. These bubbles are now groupings. If you parameterize something that is affected by parameterization, then more bubbles get made, and copies of each of those bubbles are copied into the previous bubble, like a fractal.
Pytest basically is only concerned about maintaining the ordering of groupings, so each of those bubbles get executed in one shot, but it's not particularly concerned about the order within a bubble. The groupings are pretty much determined by dependencies, so as long as you specify all dependencies explicitly (e.g. fixture a depends on b and c, fixture b depends on c and d, and fixture c depends on fixture d, giving you d -> c -> b -> a), then you'll have complete control over the order of operations for any given test, but not the test execution order.
One other issue, is that you're leveraging the global namespace for setting up dependencies. In pytest, the global namespace isn't really supposed to be used really. Everything is meant to be done through fixtures. By stepping outside the fixture system, things become a tad janky
to reiterate a bit on our conversation from earlier
first of all, regarding the cats and dogs example where expensive operations were run 6 or 7 times. i tried several python versions but i only got 5 expensive operations. if you launch this repl.it, it should produce...
test_cats_dogs.py dog charlie
cat tom
.sssdog buddy
ss.scat simba
sss.dog charlie
i would like to know how i can reproduce the behavior where 6 or 7 expensive operations are run.
regarding determinism, what i understand with it is that tests that run in the same environment and with the same command line arguments should always produce the same results. the idea is, if there's an error in the test, you can catch it, and if you fixed it, you can verify that it is fixed. if you look at my silly example here, these tests produce the same results every time, unless you run them in isolation. so these tests are deterministic. the fact that in isolation these tests produce different results is only a matter of my incompetence, not the fault within the test system itself.
as far as i know, this holds for my other examples as well. so the tests in them are deterministic.
again, while my examples do work and are deterministic, they are only mock examples that only demonstrate that pytest can already do it. if this is ever properly implemented, pytest could guarantee some of the minor details. this would only be a different guarantee in place of the existing one—one arising from the fact that parametrized fixtures are executed deterministically.
regarding the complexity. while my silly example does use branching logic (if ...: pytest.skip()
), this logic is not a part of tests, or fixtures, or the other code, but the part of the "plugin" or potentially the testing framework itself. my "ideal" example in my first example didn't use any branching logic, or parametrization. i do not see how it would be more complex.
regarding the scoped solution. all in all, there are three levels of abstraction in the tests here:
the tests themselves. they need (apply to) some dogs or cats but they don't really care about the fact that dog charlie cannot exist at the same time as dog buddy.
the fixtures that these tests use (charlie
, tom
). here a fixture dog
cares that only one dog should exist at the time, but it doesn't know or care about the same property of fixture cat
.
the expensive operations that produce cats and dogs.
so considering your example, there are multiple problems:
class TestConfigA:
@pytest.fixture(scope='class')
def charybdis_config(self):
return {"dog": : "charlie", "cat": "tom"}
first of all, there's "TestConfigA" (or "TestCharlieTom", or "configs_1121_1122", from earlier comments). if you are lucky, you can assign here a name that is meaningful. but if you are unlucky, "TestCharlieTom" is as good as it gets. so you are now stuck with a usless name and you might as well add a comment explaining why you need it in the first place. now the reader of the test is aware of the minor details of how dogs work. this is a good example of an abstraction violation.
then, you have cats and dogs in one place, even though these are unrelated. that would be a uh... a violation of separation of concerns?
futhermore, this is a violation of do not repeat yourself.
and finally, this code runs the expensive operation 8 times. as it's nested, as far as i see, it can't be optimized to run the expensive operation 5 times, which is the possible minimum. in this particular example, this is now just as slow as simply having function-scoped fixtures. i don't think there's any advantage at all here.
i would like to know how i can reproduce the behavior where 6 or 7 expensive operations are run.
Run this to install the branch of pytest I've been working on with the maintainers that resolves some fundamental bugs in how fixtures are cleaned up. It's just waiting on some code review (and probably a migration strategy), but it will eventually be merged in.
pip install git+https://github.com/blueyed/pytest.git@fixture-stack
Then put a print inside dog
and cat
. After you run it, you should see 7 print statements were executed (run it with -vs
to make sure the print statements aren't captured). Then have dog
request the cat
fixture to swap the order around and run it again, and you'll see 6 print statements were executed.
regarding determinism, what i understand with it is that tests that run in the same environment and with the same command line arguments should always produce the same results. the idea is, if there's an error in the test, you can catch it, and if you fixed it, you can verify that it is fixed. if you look at my silly example here, these tests produce the same results every time, unless you run them in isolation. so these tests are deterministic.
A test must have the same result, and operate exactly the same when run in isolation as when run as part of a suite. If they don't, then they are different tests because one relies on other tests that came before it and the other does not. Whether or not you consider this deterministic is not important. What is important is that it's an inconsistent test.
Developers will be depending on a test operating consistently between these two contexts so they can iterate rapidly. Many developers will be running a single test in isolation because it failed and they're trying to test a potential fix. If it passes in isolation, but fails as part of a suite, that will only cause frustration and cost valuable time.
the fact that parametrized fixtures are executed deterministically.
Do not depend on this. This would be banking on implementation and behavior that isn't specifically intended by pytest. It just happens to work out that way at this current moment.
regarding the complexity. while my silly example does use branching logic (if ...: pytest.skip()), this logic is not a part of tests, or fixtures, or the other code, but the part of the "plugin" or potentially the testing framework itself.
It's in a fixture, and therefore part of your test. You're not using a plugin. You're using a shortcut that raises an exception after an expensive operation has been performed.
first of all, there's "TestConfigA" (or "TestCharlieTom", or "configs_1121_1122", from earlier comments). if you are lucky, you can assign here a name that is meaningful. but if you are unlucky, "TestCharlieTom" is as good as it gets. so you are now stuck with a usless name and you might as well add a comment explaining why you need it in the first place. now the reader of the test is aware of the minor details of how dogs work. this is a good example of an abstraction violation.
The test's name would usually be a combination of the nested namespaces it's in, with each level providing more context, and eventually explaining the arrange/act/assert being done. There's no real test case here to put into words, so these are all placeholder names. So this is moot and irrelevant to the discussion. The point of my examples was to demonstrate a concept surrounding using logical structures to create more concrete isolations between tests. If you want to provide me with an actual test that you want to run, I'd be happy to show how I would name things.
then, you have cats and dogs in one place, even though these are unrelated. that would be a uh... a violation of separation of concerns?
This is the test scenario you provided. If you don't want them tested together, you'd have to come up with another scenario. If you're talking about the fact that I used a dict to house them together, remember that it's a quick mockup with placeholder data. I could have used two fixtures if that would've made more sense to you. But I can only do so much with a limited, abstract example scenario.
futhermore, this is a violation of do not repeat yourself
As I mentioned before, DRY is only good as long as you don't violate DAMP and KISS. Code gold is fun, but not helpful for writing actual code. DRY is not a law, it's a reminder that if you're repeating complex blocks of code often, it likely means you need to abstract.
The argument that I'm not following DAMP in my examples is moot, because I wasn't demonstrating actually descriptive and meaningful locations, but rather where and how they could be used in such a structure.
and finally, this code runs the expensive operation 8 times. as it's nested, as far as i see, it can't be optimized to run the expensive operation 5 times, which is the possible minimum. in this particular example, this is now just as slow as simply having function-scoped fixtures. i don't think there's any advantage at all here.
Honestly, this is starting to come off as a little combative, so I'd like to apologize if you felt I've come off that way towards you and insert a reminder that my goal here is to set you up with something that is sustainable, to keep you from trying to prematurely optimize, identify better ways to optimize, and to help point you in the right direction for a testing mindset.
I had to leave our chat earlier, but I have another solution I wanted to present but didn't have the time.
My philosophy is that you should never define a test that will never run, because it signals that there's a deeper issue. It indicates you're in an XY problem. You're focused on parameterizing fixtures to be more DRY, when DRY is a very bad thing when it comes to test definitions. But the solution may have nothing to do with fixtures at all.
You said you were working with IRC networks and that spinning multiple ones up wasn't a problem. Depending on what exactly you're testing, you could consider those networks infrastructure, and infrastructure should not be established by your tests. They should assume it's already in place, and just rely on configs to be pointed at it. So rather than focusing on how to most optimally spin up and shut down the networks, just spin up the ones you want to run tests against, and kick off the relevant tests.
You can even mark your tests based on the networks they're supposed to run against, so you can launch them by running something like pytest -m 'charlie and simba'
(I may be remembering this syntax wrong, but that's the basic functionality).
That said, this really does depend on a lot of context I just don't have, so YMMV. If I had more context, I may be able to provide more useful suggestions.
pip install git+https://github.com/blueyed/pytest.git@fixture-stack
i can confirm this regression. it's probably a non-issue as pytest git master doesn't have it and the branch fixture-stack
is 434 commits behind master, but just in case i reported it.
either way, this is a development branch on forked repo. i'm not sure how valuable it can be in the scope of this discussion.
regardless of that, the tests still pass, don't they?
Whether or not you consider this deterministic is not important. What is important is that it's an inconsistent test.
up to this point you were criticizing my proposal saying that it is not deterministic. so i suppose the question whether or not the relevant tests are deterministic or not is an important one. do you consider these tests non-deterministic?
the question whether or not these tests are consistent, that is if they can be run in isolation with the same result or not, as i understand, is not related at all to the question whether or not they are deterministic or not. you are saying that the tests in my proposal are inconsistent. would you mind showing this inconsistency? this seems to work fine:
$ pytest -k tom && \
pytest -k charlie && \
pytest -k buddy && \
pytest -k simba && \
pytest -k charlie_tom && \
pytest -k charlie_simba && \
pytest -k buddy_tom && \
pytest -k buddy_simba && \
echo "a few combinations passed"
Do not depend on this
neither my proposal, nor my silly example, depends on the fact that parametrized fixtures are executed deterministically.
that isn't specifically intended by pytest
since we are talking here about the ordering of fixture execution (which i am not depending on), something that is probably determined solely by python code, and perhaps the file system, i don't see how pytest wouldn't have to go out of its way to make tests non-deterministic, by e.g. using random. again, this is quite irrelevant to my proposal.
It's in a fixture, and therefore part of your test. You're not using a plugin. You're using a shortcut that raises an exception after an expensive operation has been performed.
again, this is a silly example that is only meant to demonstrate that pytest can easily do it. it is not in any way production level (or even alpha level) code. it is a single-file proof of concept. in the world of real tests it doesn't exist. it is not found on any abstraction level of the tests. conceptually, it's a part of a plugin (that doesn't exist yet) or pytest itself (if this issue is resolved). i'll repeat, it is not real code! just a proof of concept.
sorry if this sounded rude, if it did, i didn't intend for it to be that way. but it seems to me that you are discussing my toy example as a real-life solution to my trying to run a few irc networks with incompatible fixtures. and i'm just in a weird position of defending my very much abstract proposal to solve the issue of conflicting fixtures.
i don't really have a problem in my tests. while launching irc networks is expensive, my tests take less than a minute to run all in all. i'm not going to use paramtrization to speed up tests. again, using parametrization to optimize this is silly. it's just a proof of concept.
i hope i made myself clear. if i did, you can probably ignore the rest of this, but just to be completely clear:
to keep you from trying to prematurely optimize
the whole point of my proposal is optimization. currently i'm running one network instance per test and it works just fine. this won't solve any other problems, as there aren't any. this can only serve to speed things up.
You're focused on parameterizing fixtures to be more DRY
again, the use of parametrizing is only a proof of concept. it's not real code. i'm not really focusing on parametrizing to be more DRY. in fact, i'm not trying to be DRY at all. it's a silly proof of concept. it only has to work. it's not even that DRY as i do have to repeat myself a bit.
that spinning multiple ones up wasn't a problem.
i can't easily spin up more than one. which is exactly the problem. i wouldn't mind spinning up several if i could, that's what i might have said. it's just that so many things are baked into the service that it's hard to run more than one instance.
P.S. i still think that this kind of conversation is best suited for irc
i can confirm this regression. it's probably a non-issue as pytest git master doesn't have it and the branch fixture-stack is 434 commits behind master, but just in case i reported it.
Some stuff went down a little while back, so this got put on the backburner, but it's actually a pretty serious issue IMO. It was actually merged in for 5.3.3 and then we had to back it out in 5.3.4 because of an implementation detail (I mistakenly used a set
instead of a list
, which had some interesting consequences if dependency chains between fixtures weren't explicitly defined) and the fact that a lot of stuff was accidentally dependent on the inconsistent teardown logic so a strategy was needed to make sure there was a more smooth transition the next time around.
since we are talking here about the ordering of fixture execution (which i am not depending on), something that is probably determined solely by python code, and perhaps the file system, i don't see how pytest wouldn't have to go out of its way to make tests non-deterministic, by e.g. using random. again, this is quite irrelevant to my proposal.
To clarify, I'm not saying the current implementation of that behavior is nondeterministic. I'm saying the way the ordering happens to break down within one of those bubbles in a given release isn't intended despite being deterministic. Another deterministic implementation could be made in another release, or another version of python could change something that causes that ordering to be different, yet still deterministic given that release and version of Python.
My point here is that it's not something you should depend on, because it can change just from updating something. I'm not saying you're doing this, just cautioning you because your phrasing implied you wanted to rely on it.
you are discussing my toy example as a real-life solution to my trying to run a few irc networks with incompatible fixtures. and i'm just in a weird position of defending my very much abstract proposal to solve the issue of conflicting fixtures.
That seems to be the core of the miscommunication, then. People usually come asking for solutions, rather than discussing potential solutions, so that's my default approach.
I agree this would best be continued elsewhere. If you want to talk more, you can find me in the Selenium slack/IRC 😁
so to recap on our previous conversation, in which we, i think, came to a certain kind of an agreement
suppose you have the following tests that use fixtures of type dog
: charlie
and buddy
, and fixtures of type cat
: tom
and simba
:
def test_charlie_simba(charlie, simba): ...
def test_buddy_tom(buddy, tom): ...
def test_buddy_simba(buddy, simba): ...
def test_charlie_tom(charlie, tom): ...
notice that dog
fixtures are the leftmost arguments. suppose you also have the following test that uses a cat
first:
def test_simba_buddy(simba, buddy): ...
and this test that doesn't use fixtures at all:
def test_tree(): ...
you could have this test organized in the following simple way, using regular function-scoped fixtures. note how every fixture is set up in the order that pytest guarantees, and also how tests only use the fixtures they need. when test_tree
is active, no fixtures are run. this last quality is especially valuable when you run not all tests at once, but a subset of tests, or even individual tests.
now, the dog
and cat
fixtures are expensive. what if you want to reuse these fixtures? you could, then,
make dog
and cat
fixtures module level. but here's a problem. now tests that don't need certain fixtures are using them, and when run in isolation, the setup is different! here's a little demonstration.
explicitly scope fixtures. while this does work and yields the optimal number of fixture setups, there are a few downsides.
note that you now have classes such as TestWithCharlie
which you might find hard to name meaningfully. instead of having a test structure based on different behavior (or just your own idea of how tests should be organized), you are structuring them solely for the sake of optimization.
also note that the dog
fixture now has a wider scope than the cat
fixtures, and so these are always instantiated first. so to run the fixtures test_simba_buddy
in the correct order, you have to override at least one fixture, even if all you want is to change the scope of it.
all this makes the test harder to understand.
introduce a new scope, "only the tests that use this fixture"
. perhaps you could think of it as, “this fixture is reusable at the same level of hierarchy”. the idea would be, if you have a setup plan looking like this:
setup charlie
setup tom
test_charlie_tom
teardown tom
teardown charlie
setup charlie
setup simba
test_charlie_simba
teardown simba
teardown charlie
given that the fixture charlie is using this new property, this could become:
setup charlie
setup tom
test_charlie_tom
teardown tom
setup simba
test_charlie_simba
teardown simba
teardown charlie
and if you had a test between these tests:
setup charlie
setup tom
test_charlie_tom
teardown tom
teardown charlie
test_tree
setup charlie
setup simba
test_charlie_simba
teardown simba
teardown charlie
you could rearrange the order of tests to still have the optimization, e.g.:
setup charlie
setup tom
test_charlie_tom
teardown tom
setup simba
test_charlie_simba
teardown simba
teardown charlie
test_tree
this approach would have a few advantages:
P.S. this kind of scope is perhaps not a scope at all and might be a step towards not using scopes at all.
A few comments:
note that you now have classes such as TestWithCharlie which you might find hard to name meaningfully
It may seem tricky at first, but only because naming things in programming is typically very challenging, and we're using nebulous concepts such as dog
and cat
which don't represent behavior, and more represent dependencies, so there's a big piece of the picture missing.
In programming, namespaces and logical structures are used for organization, and the same is used here. Namespaces, provided through logical structures, allow you to represent branches in context. In the case of testing, that means dependencies and the behavior under test.
For example, you could have a package called end_to_end
that represents your end-to-end tests, and in that package, you could have a test_login.py
module, in which you have a TestMissingPassword
class that has a few test methods like test_password_field_is_red
and test_missing_password_error_shown
. The package and module name together provide context, while the class name tells you what's different about that particular login test versus the others, and the test methods tell you what it was looking for.
The end result could be something like this:
end_to_end/test_login.py::TestBadPassword.test_password_field_is_red
end_to_end/test_login.py::TestBadPassword.test_username_field_is_red
end_to_end/test_login.py::TestBadPassword.test_bad_username_or_password_error_shown
end_to_end/test_login.py::TestMissingPassword.test_password_field_is_red
end_to_end/test_login.py::TestMissingPassword.test_missing_password_error_shown
end_to_end/test_login.py::TestSuccess.test_success_message_shown
end_to_end/test_login.py::TestSuccess.test_on_landing_page
end_to_end/test_login.py::TestSuccess.test_user_info_in_header
end_to_end/test_login.py::TestSuccess.test_sign_out_button_shown
instead of having a test structure based on different behavior (or just your own idea of how tests should be organized), you are structuring them solely for the sake of optimization.
Quite the contrary. Pytest being able to optimize the logical structure based approach is more of a secondary effect of the structure. The main focus of the structuring is to organize the tests in a coherent and conventional way.
Finding tests becomes easy because I can just follow the namespaces to locate a certain collection of tests or even fixtures, and even identify if there's a hole in coverage just from that (e.g. having a test_add_item_to_cart.py
file, but no test_remove_item_from_cart.py
file). This also makes it very easy to extend, since I can just jump down to the namespace that aligns with the test I want to add (as far as the structure is built out, of course), and build out what's needed only. The structure makes it clear exactly where things belong and can be found.
You also get a few cool benefits, like being able to easily target a collection of tests for execution. For example, if someone modified the login behavior, you can just do pytest end_to_end/test_login.py
, and you're done.
Engineers use namespaces and logical structures like this because it's the best way of organizing things around functionality/behavior when writing code. That is why I recommend using it here. Not because of optimization (even though pytest provides it in this case), but because after almost 2 centuries of programming (I know, hard to believe, but it's true), this really is the best way of handling this that we've come up with, and it's the convention in every other aspect of programming.
Somethings to note:
In other languages, classes can serve very specific roles and not be considered namespaces. But in Python, literally everything is a first-class object, including packages, modules, and classes, so the lines get blurred a bit. That said, it's still probably a safe bet to not consider classes a form of namespace in Python in general, but they are a means of organizing logic related to particular functionality/behavior. Also consider that Python doesn't really have a means of defining a traditional namespace inside of a module, so a class gets used for that. If pytest were made for another language with traditional namespace definitions, I might be advocating for the use of them instead of classes, depending on how the language worked.
The structure I'm advocating for, doesn't have to be based around classes. It's really just the concept of those bubbles/XML tags. If you start with the "optimized" (I would say "organized" or "sorted" is a more appropriate term) "XML" layout we were talking about, e.g.:
setup charlie
setup tom
test_charlie_tom
teardown tom
setup simba
test_charlie_simba
teardown simba
teardown charlie
test_tree
or
<charlie>
<tom>
<test_charlie_tom />
</tom>
<simba>
<test_charlie_simba />
</simba>
</charlie>
<test_tree />
Then this tells you upfront what your structure should generally look like. What form each of these take is up to you and your preferences/needs, but namespaces are an incredibly useful tool and a perfect solution for this.
i think these two points boil down to the same thing. sure, if your test structure and properties of your fixtures are similar, you can cave the structure and naming as in your example. but this only fits the most simple cases.
to give a counterexample, suppose your code is making animal food and accessories, so your test structure looks like...
/test_food.py
TestBones
test_with_badger(badger, bone)
test_with_cat(cat, bone)
test_with_dog(dog, bone)
TestCatnip
test_with_badger(...)
...
/test_collar.py
TestLightCollar
test_with_newt(newt, collar)
test_with_dog(dog, collar)
...
TestHeavyCollar
...
now if you were to reuse the fixture dog
. it's now deep in the structure, with module and class above it. the only way to optimize this is to uproot the whole thing, making dog
a top-level fixture. to keep close to the current structure, you'd probably have to go with package scope:
/test_with_dog
/test_with_dog/test_food.py
TestBones
test_with_dog(dog, bone)
TestCatnip
test_with_dog(dog, catnip)
/test_with_dog/test_collar.py
...
/test_with_other_animals
/test_with_other_animals/test_food.py
...
i doubt anyone would find this acceptable.
P.S. this is a completely different topic, but instead of classes, pytest could use nested functions, e.g.
def tests_with_charlie(charlie):
def test_charlie_name():
assert charlie.name == "charlie"
def test_charlie_age():
assert charlie.age == 7
with this approach, it would be technically possible to get rid of scopes completely. you can see that both tests use the same fixture charlie
even though you don't know what the scope of it is. my proposal could work with scope-less fixtures as well, you'd just have to mark them as reusable:
@pytest.fixture(reusable=True)
def charlie():
...
the major downside to this that this would involve much more “magic”
I think that may be the piece of info you're missing.
"Uproot" is a very strong word. It's really just cutting and pasting (and maybe some removing self
in the args if a class is involved).
There's no need to restructure other fixtures or even the fixture you want to be available in another area. As long as the fixture is in a namespace directly above where you want to have access to it, then you can request it.
This is what the conftest.py
files are for, which you can have more than one of (they can be used wherever you could have an __init__.py
). They basically provide fixtures (and even let you use pytest hooks) for the namespace they're a part of, much like how the __init__.py
allows you to "initialize" that namespace. It's all about namespaces, all the way down.
The scope of the moved fixture doesn't even have to change. Where the fixture is defined has no real impact (other than coincidence) on when in the order of operations a fixture will execute.
So you'd only have to move that dog
fixture up as high as you need until everything that needs it is directly below (it in terms of namespace).
For example, if I go with your first example of (I added the tests
root folder):
tests/test_food.py
TestBones
test_with_badger(badger, bone)
test_with_cat(cat, bone)
test_with_dog(dog, bone)
TestCatnip
test_with_badger(...)
...
tests/test_collar.py
TestLightCollar
test_with_newt(newt, collar)
Test_with_dog(dog, collar)
...
TestHeavyCollar
...
Then I could have a tests/conftest.py
and I could define dog
in there with whatever scope it had before (I'm guessing class?). The dog
fixture doesn't even have to be defined in a class (and if it were, it wouldn't be available, because there he's to be a straight line in terms of namespace). It'll still work exactly as if it was still defined where it was previously.
It's super handy, and I do want to emphasize that you can have more than one conftest.py
in a test suite. It's a really underutilized feature IMO.
i'm not sure how this can work? if you are to use the same dog
fixture instance in these two tests:
tests::test_food::TestBones::test_with_dog
tests::test_collar::TestLightCollar::test_with_dog
you can't use a class-scoped fixture, or even a module-scoped one, can you?
Not the same instance, no. My solution is just for providing the same fixture, not for providing the same instance of the fixture.
But that brings us back to relying on logical structures and the scopes tied to them.
Taking a look back at your original comment with more context, it looks like what you are looking for isn't really possible, or at least wouldn't result in something congruent.
Your examples up until now have only had two fixtures with the "only the tests that use this fixture"
, but it doesn't seem hold up beyond that. It quickly becomes too unpredictable because it inherently means the order of operations for one test can be dictated by any other test.
My solution is just for providing the same fixture, not for providing the same instance of the fixture.
if we are not optimizing, we can just use function-scoped fixtures...
too unpredictable
what exactly is unpredictable?
the order of operations for one test can be dictated by any other test
i don't think it can? how do you mean?
if we are not optimizing, we can just use function-scoped fixtures...
You absolutely could use function-scoped fixtures exclusively.
However, I never said don't optimize. Optimization is just a secondary concern.
My primary goal is organization, and scopes larger than that level can help build a mental model by showing how dependencies are shared. But structuring them this way also makes optimizing trivial because I've already laid out things based on dependencies, so I just have to specify whether a fixture should run once for everything under it's umbrella, or if it should re-execute for every iteration on a certain scope.
For example, to adapt my solution above so that dog
is only executed once for all those tests, I would just have to change its scope to "package"
.
Pytest doesn't really optimize for you. But it gives you the tools that make optimizing very easy through scopes and structures. In other words, it doesn't do the optimization work for you; you line the dominoes up, and it follows along. That's what's being talked about here (although I think I should probably update that bit in the docs to make it clearer).
Things also just get a little janky at the function level, so I try to stay away from that.
the order of operations for one test can be dictated by any other test
i don't think it can? how do you mean?
When there isn't a clear, explicit chain of dependencies in fixtures, pytest has to make assumptions about which fixture should execute before another. For example, in this case:
@pytest.fixture
def x():
pass
@pytest.fixture
def y(x):
pass
def test_thing(y, x):
pass
it's clear that y
is dependent on x
, and therefore x
must execute first. But in this example:
@pytest.fixture
def x():
pass
@pytest.fixture
def y():
pass
def test_thing(y, x):
pass
it's unclear which should execute first, so pytest has to pick one to happen first. However, no matter how many other tests are also run, the other tests that are run can't influence the order that pytest chooses for that test, and of the possible ways it could choose, none of them really conflict with each other.
The "only the tests that use this fixture"
scope you're proposing would have pytest attempt to optimize when that fixture is run, so it's only active during the fewest tests possible while still only executing once. Because this scope effectively is providing instructions for where that fixture should be placed in the order of operations, using it would mean you should be able to expect that fixture to execute at a consistent point in that order every time, no matter what other tests are supposed to be running.
Consider the following fixtures and tests:
@pytest.fixture(scope="only the tests that use this fixture")
def x():
pass
@pytest.fixture(scope="only the tests that use this fixture")
def y():
pass
@pytest.fixture(scope="only the tests that use this fixture")
def z():
pass
def test_x(x):
pass
def test_x_y(x, y):
pass
def test_y(y):
pass
def test_y_z(y, z):
pass
def test_z(z):
pass
def test_z_x(z, x):
pass
The order of operations for just calling pytest
can't really be determined through static analysis, But we can probably figure out all the possibilities (I won't list them, of course, because that's a lot of text). However, The grouping would be pretty strict for the following commands:
pytest -k x
, which runs these tests:test::test_x
test::test_x_y
test::test_z_x
with these groupings:
<x>
<test_x />
<y>
<test_x_y />
</y>
<z>
<test_z_x />
</z>
</x>
pytest -k y
, which runs these tests:test::test_y
test::test_x_y
test::test_y_z
with these groupings:
<y>
<test_y />
<x>
<test_x_y />
</x>
<z>
<test_y_z />
</z>
</y>
pytest -k z
, which runs these tests:test::test_z
test::test_z_x
test::test_y_z
with these groupings:
<z>
<test_z />
<x>
<test_z_x />
</x>
<y>
<test_y_z />
</y>
</z>
While the sorting within the outermost group may vary, the nesting is predictable. These are all in direct conflict with each other. What makes it unpredictable, is the fact that a given test can be absolutely certain what its order of operation will be, but only some of the time, as pytest would then consider all tests that it would be attempting to run before deciding what the OoO for that test would be.
For example, in this case, if test_z_x
knows test_y_z
will also be running, but test_x_y
won't, then it can know that z
will be executing before x
. But if all 3 will be running, it won't have any idea. It's effectively ambiguous ambiguity.
def test_thing(y, x): pass
it's unclear which should execute first, so pytest has to pick one to happen first
but it is clear? according to the documentation, y
should run first (and it does in this case)
pytest -k x, which runs these tests:
test::test_x test::test_x_y test::test_z_x
with these groupings
<x> <test_x /> <y> <test_x_y /> </y> <z> <test_z_x /> </z> </x>
but this is wrong? you have fixture z
inside of fixture x
. the correct grouping that follows existing pytest promises (and my proposal) would be:
<x>
<test_x />
<y>
<test_x_y />
</y>
</x>
<z>
<x>
<test_z_x />
</x>
</z>
similarly, for -k y
, you'd have:
<y>
<test_y />
<z>
<test_y_z />
</z>
</y>
<x>
<y>
<test_x_y />
</y>
</x>
and for -k z
, you'd have:
<z>
<test_z />
<x>
<test_z_x />
</x>
</z>
<y>
<z>
<test_y_z />
</z>
</y>
or for all tests
<x>
<test_x />
<y>
<test_x_y />
</y>
</x>
<y>
<test_y />
<z>
<test_y_z />
</z>
</y>
<z>
<test_z />
<x>
<test_z_x />
</x>
</z>
there's no ambiguity whatsoever?
but it is clear? according to the documentation, yshould run first (and it does in this case)
Unfortunately, no. I see where that's referenced in the docs, but it's actually incorrect, and it's quite easy to prove. If those fixtures were module-scoped, and there was another test function with the order reversed, it would be ambiguous.
This comes up every once in a while, but nothing outside of the fixture scope/dependency system is intended to control order of operations, because it would fall apart too easily otherwise.
If you need something to happen before something else, the only reliable way to do this is to either have the latter thing request the former thing, rely on the latter thing having a smaller scope then the former thing, or have the former thing be autouse and make sure the latter thing isn't requested by any autouse things (that last one isn't too dependable for the same reason).
The fact that it works is merely coincidence and a result of a deterministic fixture pipeline (deterministic in the literal sense). The requested fixtures are stored as a list, and iteration over a list is deterministic (this is actually touching the area of that serious bug we talked about, and I believe this or something related came up on one of the tickets).
I should make a note to fix that... There's actually a lot of stuff in there I should touch up haha
but this is wrong? you have fixture z inside of fixture x. the correct grouping that follows existing pytest promises (and my proposal) would be:
There is nothing to indicate z can't be in x, unless certain tests aren't executed, which isn't always the case, hence the ambiguous ambiguity.
Like I said, unless you rely on scope or an explicit dependency chain (or autouse, but that's kinda iffy), then the rules become ambiguous.
Nope. I see where that's referenced in the docs, but it's actually incorrect, and it's quite easy to prove. If those fixtures were module-scoped, and there was another test function with the order reversed, it would be ambiguous.
so, you are saying that the documentation is wrong, and that ambiguity is acceptable? and then you are arguing against a proposal that does away with ambiguity?
There is nothing to indicate z can't be in x
however wrong, the pytest documentation is still indicating that. also, the text of my proposal is indicating that. again, as i mentioned, how the tests are actually run is quite orthogonal to it. if you welcome ambiguity, your conflicting examples in the comment before the last one will do the job. if you don't welcome it, you can have a very predictable test structure.
so, you are saying that the documentation is wrong, and that ambiguity is acceptable?
The documentation is incorrect, yes, but the ambiguity isn't a question if acceptability. I'm saying that ambiguity is unavoidable for pytest as a framework, as I demonstrated with the bit about swapping the order and making the fixtures module-scoped, and the example with the x, y, and z fixtures.
and then you are arguing against a proposal that does away with ambiguity?
No, I'm arguing against a proposal that allows the order of operations of tests to be affected by the existence/execution of other tests.
I abhor ambiguity. And that's exactly why I don't want pytest making decisions about the order of operations on my behalf.
Your proposal forces pytest to make decisions on your behalf with regards to the order of operations, and do so differently based on which tests are running. Since it's making decisions for you, that means there's still ambiguity. It only becomes less ambiguous in the context of you executing a select subset of your tests, but even then, it requires pytest to consider all the tests that will be executed for that test run before it (or you) can know how to make it less ambiguous.
Pytest provides all the tools necessary that allow us to eliminate ambiguity as it stands. The tools don't guarantee ambiguity will be eliminated, but that's unavoidable with most things in programming. If someone uses the fixture scopes as they are now, and leaves ambiguity, that's not good either.
If you don't want ambiguity, the only way to actually solve it is to establish clear dependency chains.
okay, just to be clear. do you find this test acceptable?
@pytest.fixture
def x():
pass
@pytest.fixture
def y():
pass
def test_y_x(y, x):
pass
you are saying:
it's unclear which should execute first, so pytest has to pick one to happen first. However, no matter how many other tests are also run, the other tests that are run can't influence the order that pytest chooses for that test, and of the possible ways it could choose, none of them really conflict with each other.
this also true about my proposal, isn't it? so if you find this acceptable, should you not find my proposal acceptable?
No, I'm arguing against a proposal that allows the order of operations of tests to be affected by the existence/execution of other tests.
i think i showed how the order of execution of test in my proposal, regardless of the above, does not in any way change the set up plan of individual fixtures. do you mean something else by “order of operations of tests”?
Since it's making decisions for you, that means there's still ambiguity
would you explain what decisions pytest would be making?
if a fixture a
depends on b
, then b
is executed first. this might seem obvious, but it is only true as this is something that pytest documentation guarantees. if there is a test test_foo(a, b)
, then, according to my proposal, fixture a
must be run before fixture b
. pytest doesn't have to make any decisitions, the documentation would clearly state what pytest should do in both cases.
I'm saying that ambiguity is unavoidable for pytest as a framework
i don't see why it's unavoidable in principle. if you just remove explicit scopes altogether, you won't have this problem
okay, just to be clear. do you find this test acceptable? ... this also true about my proposal, isn't it? so if you find this acceptable, should you not find my proposal acceptable?
Ah, no. That I do not find acceptable, because there is ambiguity as it's unclear if x
should execute before y
or the other way around (remember, the docs are wrong).
My goal with the x/y/z example was to demonstrate that your proposal doesn't get rid of ambiguity, because pytest would still have to decide which fixtures execute before the others, and it would change based on the tests that would be running for that test run.
i think i showed how the order of execution of test in my proposal, regardless of the above, does not in any way change the set up plan of individual fixtures. do you mean something else by “order of operations of tests”? ... would you explain what decisions pytest would be making?
I see what you're saying, but your proposal does not do enough to eliminate ambiguity because a given test can still have a different fixture execution order depending on what other tests are running at that time. At that point, it wouldn't bring anything to the table we don't already have with how fixtures currently work.
As I demonstrated in the x/y/z example, pytest still has to decide which fixture executes before another.
In that x/y/z example, if I run pytest -k z
, it would run these tests:
test::test_z
test::test_z_x
test::test_y_z
with these groupings:
<z>
<test_z />
<x>
<test_z_x />
</x>
<y>
<test_y_z />
</y>
</z>
and if I run pytest -k y
, then I get these tests:
test::test_y
test::test_x_y
test::test_y_z
with these groupings:
<y>
<test_y />
<x>
<test_x_y />
</x>
<z>
<test_y_z />
</z>
</y>
It's not specified whether y
should always execute before z
, or the other way around. So in the pytest -k y
case, y
happens before z
, and in the pytest -k z
case, z
happens before y
.
Pytest had to make decide which would execute before the other in both cases, because the information was not provided by the programmer.
if a fixture a depends on b, then b is executed first. this might seem obvious, but it is only true as this is something that pytest documentation guarantees. if there is a test
test_foo(a, b)
, then, according to my proposal, fixture a must be run before fixture b. pytest doesn't have to make any decisions, the documentation would clearly state what pytest should do in both cases.
Again, the documentation is wrong, and needs to be corrected. Fixture request order in a given test/fixture signature simply cannot guarantee fixture execution order, and this can be demonstrated like so:
@pytest.fixture(scope="module")
def order():
return []
@pytest.fixture(scope="module")
def x(order):
order.append("x")
@pytest.fixture(scope="module")
def y(order):
order.append("y")
def test_x_y(x, y, order):
assert order == ["x", "y"]
def test_y_x(y, x, order):
assert order == ["y", "x"]
If either test is run in isolation, they will pass. But if run as a suite, one will always fail, despite each one providing what you believe to be clear instructions on which fixture should execute first.
i don't see why it's unavoidable in principle. if you just remove explicit scopes altogether, you won't have this problem
Removing explicit scopes would reduce everything to the function scope level, as pytest wouldn't be able to assume which fixtures don't need to be re-executed for certain groups of tests, and wouldn't eliminate the ambiguity. Pytest would still have to decide for you which fixtures go before others if clear dependencies aren't established explicitly.
If everything were reduced to the function level because explicit scopes were removed, and the order fixtures are requested in fixture/test signatures did control execution order, then it still wouldn't eliminate ambiguity because of autouse fixtures. For example:
@pytest.fixture
def order():
return []
@pytest.fixture(autouse=True)
def y(order):
order.append("y")
@pytest.fixture(autouse=True)
def x(order):
order.append("x")
def test_order(order):
assert order == ["y", "x"]
If explicit scopes and autouse fixtures were eliminated, and fixture request order did control fixture execution order, then fixture execution order could be determined in exactly the same way as MRO, and only then would ambiguity be eliminated (an algorithm other than MRO could be used, but then it would just be inconsistent with Python as a whole), because you'd be forced to explicitly state what is dependent on what, and in what order. But that's already a requirement if you want to eliminate ambiguity with how pytest currently works.
If you then make a new proposal after eliminating scope and autouse fixtures to somehow mark fixtures so that they know which groups of tests they don't have to re-execute between, you'd have gone full circle and reimplemented scopes.
That's why the potential for ambiguity is unavoidable in pytest.
Hello. Thanks for a great test framework. I have not worked long with pytest and I'm sorry if there already is a way to handle this.
I want to use pytest for testing on hardware where we can have some configurations. It is easy to set configurations before each test but it unfortunately takes up to a minute (as our hw need to be restarted after each config change). So to save time I want to group tests that have the same configurations.
I can easily use both markups and fixtures to set or act on configurations but unfortunately I have not found any way to tell pytest which configurations of our hardware are incompatible.
Here is an example where we get into problem:
The fixtures will group tests to minimize the number of open fixtures. However it does not know that fixture set_config_1_to_1 and set_config_1_to_2 cannot both be active at the same time.
I would also like this to work through an entire session so we can run all tests in our repos with one config and then change config. Then run next config etc. Currently my solution is to handle these configurations outside pytest and run applicable ones based on whatever configurations are active.
That has the downside that tests are run multiple times as many combinations of configurations are applicable for some of our tests leading to longer test times.