Closed majosm closed 2 months ago
I think this change would be moving us in the wrong direction if we intend to run the prediction out of mirgecom@main. In my opinion, if the upstream packages cannot support prediction, then their tests of mirgecom deserve to fail. I don't like the strategy of massaging the tests or massaging what gets tested to cater specially to the upstream testing.
If the goal of mirgecom@main is to keep it in a state that always works with all subs@main, then I can see this change, but I thought one of our major goals was to make mirgecom@main capable of running prediction. I'd like to move the prediction testing to main, but that will definitely cause the upstream packages that cannot support prediction to fail their downstream testing of mirgecom.
I think this change would be moving us in the wrong direction if we intend to run the prediction out of mirgecom@main. In my opinion, if the upstream packages cannot support prediction, then their tests of mirgecom deserve to fail. I don't like the strategy of massaging the tests or massaging what gets tested to cater specially to the upstream testing.
If the goal of mirgecom@main is to keep it in a state that always works with all subs@main, then I can see this change, but I thought one of our major goals was to make mirgecom@main capable of running prediction. I'd like to move the prediction testing to main, but that will definitely cause the upstream packages that cannot support prediction to fail their downstream testing of mirgecom.
This PR (short of the requirements.txt
change, which is temporary) shouldn't affect whether we can run the prediction out of main. The idea is to make the functionality that depends on inducer/grudge#285 optional until that PR gets merged, so that we can use it here in the meantime without also breaking all of the upstream CI. This does mean that any prediction tests that get added will need to be conditional as well.
In the future I think we should try to avoid changing requirements.txt
on mirgecom@main; it tends to make a mess of things.
This PR (short of the
requirements.txt
change, which is temporary) shouldn't affect whether we can run the prediction out of main. The idea is to make the functionality that depends on inducer/grudge#285 optional until that PR gets merged, so that we can use it here in the meantime without also breaking all of the upstream CI.
I get that, but that's what I meant. This change set just disables or avoids the problem that the upstream package deserves to encounter because it is not up to running what mirgecom needs to do the prediction.
This does mean that any prediction tests that get added will need to be conditional as well.
We cannot effectively test prediction in main with this strategy. I think what you are saying is that if our tests break the upstream CI tests, then we should change our tests. Should the prediction run from main? It cannot unless we change requirements@main to support prediction.
In the future I think we should try to avoid changing
requirements.txt
on mirgecom@main; it tends to make a mess of things.
It depends on our goals for mirgecom@main. If the goal is to keep the upstream packages' CI working on all their main branches, then I agree. If the goal is to do prediction from mirgecom@main, then the tests in mirgecom@main that check whether that is possible deserve to fail when the upstream packages do not support prediction.
At some point we should have a group discussion all together about what our goals are for mirgecom@main, and why the push to merge prediction capabilities to main. I am not opposed to keeping the production branch and moving all subpkg version tweaks up into production, but I think we should agree on a strategy together.
I've been pressured to step on the gas and make mirgecom@main capable of prediction, so that's been my focus, and the motivation behind the concern over the strategy taken by this change set.
I get that, but that's what I meant. This change set just disables or avoids the problem that the upstream package deserves to encounter because it is not up to running what mirgecom needs to do the prediction.
I don't think I agree with this. We change requirements.txt
in mirgecom at our own risk. That shouldn't affect whether upstream CI deserves to pass or not.
We cannot effectively test prediction in main with this strategy. I think what you are saying is that if our tests break the upstream CI tests, then we should change our tests. Should the prediction run from main? It cannot unless we change requirements@main to support prediction.
We don't change the tests; we make running them conditional on whether we have the needed requirements or not (i.e., they run in our CI, but not upstream).
It depends on our goals for mirgecom@main. If the goal is to keep the upstream packages' CI working on all their main branches, then I agree. If the goal is to do prediction from mirgecom@main, then the tests in mirgecom@main that check whether that is possible deserve to fail when the upstream packages do not support prediction.
At some point we should have a group discussion all together about what our goals are for mirgecom@main, and why the push to merge prediction capabilities to main. I am not opposed to keeping the production branch and moving all subpkg version tweaks up into production, but I think we should agree on a strategy together.
I've been pressured to step on the gas and make mirgecom@main capable of prediction, so that's been my focus, and the motivation behind the concern over the strategy taken by this change set.
I'm suggesting that generally we shouldn't consider PRs ready to merge until their upstream dependencies have been merged. I think it creates too many potential headaches.
We don't change the tests; we make running them conditional on whether we have the needed requirements or not (i.e., they run in our CI, but not upstream).
This is exactly what I mean by change the tests, though. We choose to only run the tests that pass by changing them in this way. I think we should indicate the capabilities that are required to do our prediction by the prediction tests, and not choose to run only the ones that pass. The failing ones kind of define what we should do next.
I'm suggesting that generally we shouldn't consider PRs ready to merge until their upstream dependencies have been merged. I think it creates too many potential headaches.
I don't disagree. That does mean that we realistically are not likely to run the prediction from main for a long time to come, and that's really ok with me. Just going by the rate at which the outstanding things are merging, and the rate that people are leaving.
It'd be great if production's only difference from main was in requirements.txt
in that case. We could just switch all mirgecom@main back to subpkg development heads and upstream CIs would pass. That still leaves us with a hole in testing prediction. What if the prediction case had its own test script that upstream packages could call to test it? Then instead of making prediction tests part of the mirgecom tests, we could just run the driver-native downstream tests as their own package?
This is exactly what I mean by change the tests, though. We choose to only run the tests that pass by changing them in this way. I think we should indicate the capabilities that are required to do our prediction by the prediction tests, and not choose to run only the ones that pass. The failing ones kind of define what we should do next.
I mean, I get why you would want some indication that the upstream package does not presently support the entire feature set required by mirgecom. I don't know if making the CI fail is the right way to do that though, since it prevents other things from being merged in the meantime (unless that CI job is made optional, which is also not great). IMO CI should fail if something is wrong, not if something is TBD. pytest.xfail
is maybe along the lines of what you're after (though I suppose that also counts as modifying the tests).
I don't disagree. That does mean that we realistically are not likely to run the prediction from main for a long time to come, and that's really ok with me. Just going by the rate at which the outstanding things are merging, and the rate that people are leaving.
Yeah, I'm mainly thinking about the wall coupling at the moment, which was the immediate cause of the CI issues. The fusion, etc. changes are more difficult. For long-lived subpackage branches like that, doing what we did may have been the only viable option (otherwise there would have been long-term incompatibilities between main and production, IIRC?). One thing we could maybe do in those cases is start treating the forked subpackage as the canonical one, set up CI there, and temporarily forget about the inducer version (it could disable its mirgecom CI, or limit it to a subset of supported functionality like what this change does). Then any changes from the inducer fork could be periodically merged into the canonical fork via PR (which then must pass the CI that tests mirgecom's full prediction feature set).
It'd be great if production's only difference from main was in
requirements.txt
in that case. We could just switch all mirgecom@main back to subpkg development heads and upstream CIs would pass. That still leaves us with a hole in testing prediction. What if the prediction case had its own test script that upstream packages could call to test it? Then instead of making prediction tests part of the mirgecom tests, we could just run the driver-native downstream tests as their own package?
(Not sure if I follow this part, I think I'd need to see more details.)
I mean, I get why you would want some indication that the upstream package does not presently support the entire feature set required by mirgecom.
I think we need to consider whether the required feature set includes prediction. If so, then imo the upstream packages CI should show where that falls down. I.e. the upstream CI failures are meaningful.
I don't know if making the CI fail is the right way to do that though, since it prevents other things from being merged in the meantime (unless that CI job is made optional, which is also not great). IMO CI should fail if something is wrong, not if something is TBD.
pytest.xfail
is maybe along the lines of what you're after (though I suppose that also counts as modifying the tests).
I don't know either, but I do have a very strong feeling that we need to plug the prediction testing holes that allow packages up-and-down the toolchain to be tweaked in such a way that they break, or at least frustrate the prediction effort. I think one of the main questions here is whether that prediction hole should be plugged by mirgecom@main, or mirgecom@production. Both have deep implications about the specifics of how we can test the prediction capability.
Yeah, I'm mainly thinking about the wall coupling at the moment, which was the immediate cause of the CI issues.
Right, but wall coupling is required for prediction, and for effective testing of prediction capabilities. If we intend to run the prediction out of mirgecom@main, then that prediction capability should be tested in mirgecom@main, and the upstream CI failure is a meaningful indication that the tested branch (say of grudge or arraycontext) does not provide required capabilities. This was the point which gave me pause over this current change set.
The fusion, etc. changes are more difficult. For long-lived subpackage branches like that, doing what we did may have been the only viable option (otherwise there would have been long-term incompatibilities between main and production, IIRC?). One thing we could maybe do in those cases is start treating the forked subpackage as the canonical one, set up CI there, and temporarily forget about the inducer version (it could disable its mirgecom CI, or limit it to a subset of supported functionality like what this change does). Then any changes from the inducer fork could be periodically merged into the canonical fork via PR (which then must pass the CI that tests mirgecom's full prediction feature set).
I like this suggestion, and I think I don't fully appreciate the difference between this approach and the one we have taken for mirgecom. It was my understanding that when we change requirements.txt
in mirgecom@main, we basically have done what you suggested - made those indicated versions the "canonical" ones. I think that was the intent, at least.
What if the prediction case had its own test script that upstream packages could call to test it? Then instead of making prediction tests part of the mirgecom tests, we could just run the driver-native downstream tests as their own package?
(Not sure if I follow this part, I think I'd need to see more details.)
In our current strategy, prediction drivers are not tested by the upstream pkg CIs. Our current prediction testing is really only effective for changes made to mirgecom itself. The only prediction capabilities that are tested by upstream CI are those that are tested indirectly by running the examples in mirgrecom@main - which is (iiuc) what the upstream CI tests do. That summarizes the big prediction testing hole I keep spouting on about. This hole frequently allows changes to be made (to upstream pkgs) which break or otherwise frustrate prediction efforts (and often at critical times).
On the other hand, mirgecom@main native CI actually runs the prediction drivers against proposed changes, and we get immediate feedback about whether those changes jibe with prediction. In a way, the native mirgecom@main CI's testing of prediction is our own "downstream" package test (i.e. if you view prediction
as downstream of mirgecom
).
So one thing that we could do is call drivers_y3-prediction
a downstream package wrt mirgecom, and add that as an additional downstream testing suite that the upstream CI's (i.e. those of grudge, arraycontext, mirgecom, etc) run. So for example, current grudge@main could pass its downstream testing of mirgecom, but would fail its downstream testing of prediction
(and rightly so).
Regardless, I think we should have a discussion about this, and our strategy for mirgecom@main, mirgecom@production, and how to effectively test the prediction capabilities. Ultimately, I do not have a strong preference about the how, and where, but I have a very strong preference for plugging the prediction testing holes.
I like this suggestion, and I think I don't fully appreciate the difference between this approach and the one we have taken for mirgecom. It was my understanding that when we change
requirements.txt
in mirgecom@main, we basically have done what you suggested - made those indicated versions the "canonical" ones. I think that was the intent, at least.
The distinction just comes down to which fork CI gets run on. Currently we run each subpackage's CI on Andreas' fork even when we're not using it (i.e. we're using mine or Kaushik's). If we ran CI on those instead, we wouldn't have to worry about whether the branch has such-and-such capability or not; we know it does because that's what we're using for the prediction. At this point though it may not be worth the effort setting that up, because it sounds like Kaushik's additions to his branches will be winding down (and my branch is presently static apart from pulling things in from inducer).
In our current strategy, prediction drivers are not tested by the upstream pkg CIs. Our current prediction testing is really only effective for changes made to mirgecom itself. The only prediction capabilities that are tested by upstream CI are those that are tested indirectly by running the examples in mirgrecom@main - which is (iiuc) what the upstream CI tests do. That summarizes the big prediction testing hole I keep spouting on about. This hole frequently allows changes to be made (to upstream pkgs) which break or otherwise frustrate prediction efforts (and often at critical times).
On the other hand, mirgecom@main native CI actually runs the prediction drivers against proposed changes, and we get immediate feedback about whether those changes jibe with prediction. In a way, the native mirgecom@main CI's testing of prediction is our own "downstream" package test (i.e. if you view
prediction
as downstream ofmirgecom
).So one thing that we could do is call
drivers_y3-prediction
a downstream package wrt mirgecom, and add that as an additional downstream testing suite that the upstream CI's (i.e. those of grudge, arraycontext, mirgecom, etc) run. So for example, current grudge@main could pass its downstream testing of mirgecom, but would fail its downstream testing ofprediction
(and rightly so).Regardless, I think we should have a discussion about this, and our strategy for mirgecom@main, mirgecom@production, and how to effectively test the prediction capabilities. Ultimately, I do not have a strong preference about the how, and where, but I have a very strong preference for plugging the prediction testing holes.
I see. Yeah, I don't know what would be preferable between adding a separate CI job vs. adding prediction-analogous tests to mirgecom. Maybe we can discuss this together on Friday. (I still don't think I agree that upstream CI should fail if it's missing a feature needed for the prediction, but I guess that's a matter of philosophy.)
The distinction just comes down to which fork CI gets run on. Currently we run each subpackage's CI on Andreas' fork even when we're not using it (i.e. we're using mine or Kaushik's). If we ran CI on those instead, we wouldn't have to worry about whether the branch has such-and-such capability or not; we know it does because that's what we're using for the prediction. At this point though it may not be worth the effort setting that up, because it sounds like Kaushik's additions to his branches will be winding down (and my branch is presently static apart from pulling things in from inducer).
I like this even more now. What about if the major upstream packages including at least grudge, meshmode, pytato, arraycontext, had a "production" or "prediction" branch or fork, and we always pointed at those? Then we could update them through PR. I think that is what you are saying above, and I like it quite a lot. I'm not certain it would solve all of our prediction testing deficits, but I think it would help with some of the current dilemma over upstream pkg CIs.
It could be worth it for us to think more about this option, because it has potential for helping us encounter less of these critical prediction conflicts moving forward, even if there will be less changes to the actual branches/forks coming up.
The questions about whether we should maintain the prediction capability as a collection of outstanding developments merged together into a "production" branch, or run it out of main is still something we should think about and come together on.
I suppose if we go with your idea of having "production" versions of the sub-packages, then a "production" version of mirgecom is a nice symmetry there and gives us a chance to build on things that aren't ready for merge. The "production" subkgs could just do their downstream prediction CI tests with the "production" version of mirgecom.
On the other hand, the "production" approach encourages us to build off things that aren't ready for merge. If we have "production" subpackages then we could just point mirgecom@main at those and never change them. A more rigid/strict approach would force us to merge more often and be more mindful of changes to the prediction-supporting toolchain.
Our choices now about this development "flow" and strategy have a pretty heavy effect on our choices for testing the prediction capabilities.
Superseded by #1048, closing.
Makes coupling functionality conditional on having a coupling-enabled branch of grudge.
cc @inducer
Questions for the review: