Closed Bobgy closed 2 years ago
Notice that, postsubmit test before the release commit passed, however the released version had a problematic tfx sample pipeline (ignore the failure, it was expected for a release commit).
Integration test of the passing commit: https://oss-prow.knative.dev/view/gs/oss-prow/logs/kubeflow-pipeline-postsubmit-integration-test/1360157617023881216
What's even weirder, after the fix https://github.com/kubeflow/pipelines/pull/5165, postsubmit tests start to fail on parameterized_tfx_sample pipeline.
One problem I found was that, the integration test in postsubmit is incorrectly using presubmit test script: https://github.com/GoogleCloudPlatform/oss-test-infra/blob/f29fc29cd617497ea44164ff6a1734c7dee3c0f4/prow/prowjobs/kubeflow/pipelines/kubeflow-pipelines-postsubmits.yaml#L50
EDIT: this is not root cause of this problem
I think I found the root cause, it's caused by incomplete upgrade of tfx dependencies.
Let me explain the order of events happening:
pip started to fail for python 3.5 images, so we fixed image build by upgrading to tfx==0.26.0 in https://github.com/kubeflow/pipelines/pull/5052.
At this point, builtin sample is already failing, because tfx sdk version is bumped to 0.26.0, but tfx image is still at 0.22.0. However, postsubmit tests do not fail, because sample-test/requirements.in was still at tfx==0.22.0.
we fixed builtin sample by updating tfx image: #5165
However, postsubmit tests start to fail for tfx sample, because tfx image is now at 0.27.0 while sample-test/requirements.in was still at tfx==0.22.0.
High priority problems fixed and I made samples-test to imports the same requirements.in from backend requirements.in.
What's missing is that, people may update requirements.in, but forget to update all requirements.txt. EDIT: this still seems like a high priority and easy mistake.
/assign @chensun Do you think you can take this?
and https://github.com/kubeflow/pipelines/pull/5187 is pending review
High priority problems fixed and I made samples-test to imports the same requirements.in from backend requirements.in.
What's missing is that, people may update requirements.in, but forget to update all requirements.txt. EDIT: this still seems like a high priority and easy mistake.
/assign @chensun Do you think you can take this?
Aren't we deprecating requirements.in
given sdk/python/requirements.txt
covered by https://github.com/kubeflow/pipelines/pull/5056?
@chensun for clarification, #5056 explicitly disabled renovate for python packages.
You can make the decision to enable it.
@chensun for clarification, #5056 explicitly disabled renovate for python packages.
You can make the decision to enable it.
I see, thanks for the clarification. Checked again, and it did disabled python. (I was under the wrong impression that sdk/python/requirements.txt was covered as I saw an ignore list with some components path yet sdk is not in that list).
I want to echo again what I said here. I think
pip install -r sdk/python/requirements.txt
doesn't represent the most common user journey -- think about our notebook samples, it only haspip install kfp
orpip install kfp==<pinned version>
, but I've rarely seenpip install -r sdk/python/requirements.txt
.I would suggest we move away from installing requirements.txt in tests. So the tests creates an environment closer to a fresh installation of
pip install kfp
. If there's a newly emerged dependency issue, we would probably be able to see it in tests.P.S.: Taking tfx for example, their
requirements.txt
contains nothing but-e .
(read from setup.py)
-- @chensun
Moving some of the discussion from the PR thread back to the main issue.
I agree with Chen, testing as what users would get is an important test. However, we used to do that before introducing requirements.{in/txt}, the result was that from time to time presubmit broke without any clues and we needed to dig through the dependencies to find out why.
I just want to make sure we are not going in cycles, we should admit approaches have their pros and cons.
I think the discussion laid out in https://github.com/kubeflow/pipelines/issues/4682 is not significantly different from what we have here. Maybe the best approach is also to have a requirements.txt, but set up a bot to update it periodically as PRs. In this way, if that update PR fails, we know users might hit that problem too, but it won't be blocking presubmit tests (and other people not working on this problem).
However, we used to do that before introducing requirements.{in/txt}, the result was that from time to time presubmit broke without any clues and we needed to dig through the dependencies to find out why.
If I'm not mistaken, this is usually due to new versions of dependencies not compatible with other existing dependencies. And that's a sign that we need to fix setup.py
by adding upper limit restrictions to the dependencies. Using requirements.txt
in test is not solving the underlying issue but hiding it.
The fact that many of the dependencies listed in kfp setup.py
don't have the upper version limit is problematic IMHO.
https://github.com/kubeflow/pipelines/blob/9bc63f59b27886a84bfcf4ece1062d489d06e0f5/sdk/python/setup.py#L24-L47
So I think one action item is to add upper limits regardless whether we use requirements.txt
in tests. WDYT?
EDIT: created https://github.com/kubeflow/pipelines/pull/5258
TODO: update TFX upgrade documentation
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
In #5137, tfx sample pipeline was broken, but it was not caught by postsubmit test.
Originally posted in https://github.com/kubeflow/pipelines/issues/5137#issuecomment-783788084
I think this is a high priority issue, because it causes a lot of efforts after 1.4.0 was released.
/cc @numerology @chensun