[Testing] tfx sample pipeline broken in release not caught by postsubmit

Bobgy commented 3 years ago

In #5137, tfx sample pipeline was broken, but it was not caught by postsubmit test.

IIRC our integration test should cover this sample, do we know why this is not captured?

Originally posted in https://github.com/kubeflow/pipelines/issues/5137#issuecomment-783788084

I think this is a high priority issue, because it causes a lot of efforts after 1.4.0 was released.

/cc @numerology @chensun

Bobgy commented 3 years ago

Notice that, postsubmit test before the release commit passed, however the released version had a problematic tfx sample pipeline (ignore the failure, it was expected for a release commit).

Integration test of the passing commit: https://oss-prow.knative.dev/view/gs/oss-prow/logs/kubeflow-pipeline-postsubmit-integration-test/1360157617023881216

What's even weirder, after the fix https://github.com/kubeflow/pipelines/pull/5165, postsubmit tests start to fail on parameterized_tfx_sample pipeline.

Bobgy commented 3 years ago

One problem I found was that, the integration test in postsubmit is incorrectly using presubmit test script: https://github.com/GoogleCloudPlatform/oss-test-infra/blob/f29fc29cd617497ea44164ff6a1734c7dee3c0f4/prow/prowjobs/kubeflow/pipelines/kubeflow-pipelines-postsubmits.yaml#L50

EDIT: this is not root cause of this problem

Bobgy commented 3 years ago

I think I found the root cause, it's caused by incomplete upgrade of tfx dependencies.

Let me explain the order of events happening:

pip started to fail for python 3.5 images, so we fixed image build by upgrading to tfx==0.26.0 in https://github.com/kubeflow/pipelines/pull/5052.

At this point, builtin sample is already failing, because tfx sdk version is bumped to 0.26.0, but tfx image is still at 0.22.0. However, postsubmit tests do not fail, because sample-test/requirements.in was still at tfx==0.22.0.
we released KFP 1.4.0
we got user report that the builtin tfx sample fails -- as expected
we fixed builtin sample by updating tfx image: #5165

However, postsubmit tests start to fail for tfx sample, because tfx image is now at 0.27.0 while sample-test/requirements.in was still at tfx==0.22.0.

Conclusion

We should clearly document how to upgrade TFX dependency in KFP, so that people do not mistakenly only upgrade a subset of them.
It'll be more ideal if tfx version for the multiple places have a single source of truth, or they can be updated programmatically using a script. A quick action we can do is probably letting sample-test/requirements.in derive from backend compiler requirements, so that they are always in-sync -- sample-test env is the same as prebuilt sample compilation env.

Bobgy commented 3 years ago

High priority problems fixed and I made samples-test to imports the same requirements.in from backend requirements.in.

What's missing is that, people may update requirements.in, but forget to update all requirements.txt. EDIT: this still seems like a high priority and easy mistake.

/assign @chensun Do you think you can take this?

Bobgy commented 3 years ago

and https://github.com/kubeflow/pipelines/pull/5187 is pending review

chensun commented 3 years ago

High priority problems fixed and I made samples-test to imports the same requirements.in from backend requirements.in.

What's missing is that, people may update requirements.in, but forget to update all requirements.txt. EDIT: this still seems like a high priority and easy mistake.

/assign @chensun Do you think you can take this?

Aren't we deprecating requirements.in given sdk/python/requirements.txt covered by https://github.com/kubeflow/pipelines/pull/5056?

Bobgy commented 3 years ago

@chensun for clarification, #5056 explicitly disabled renovate for python packages.

You can make the decision to enable it.

chensun commented 3 years ago

@chensun for clarification, #5056 explicitly disabled renovate for python packages.

You can make the decision to enable it.

I see, thanks for the clarification. Checked again, and it did disabled python. (I was under the wrong impression that sdk/python/requirements.txt was covered as I saw an ignore list with some components path yet sdk is not in that list).

Bobgy commented 3 years ago

I want to echo again what I said here. I think pip install -r sdk/python/requirements.txt doesn't represent the most common user journey -- think about our notebook samples, it only has pip install kfp or pip install kfp==<pinned version>, but I've rarely seen pip install -r sdk/python/requirements.txt.

I would suggest we move away from installing requirements.txt in tests. So the tests creates an environment closer to a fresh installation of pip install kfp. If there's a newly emerged dependency issue, we would probably be able to see it in tests.

P.S.: Taking tfx for example, their requirements.txt contains nothing but -e . (read from setup.py)

-- @chensun

Moving some of the discussion from the PR thread back to the main issue.

I agree with Chen, testing as what users would get is an important test. However, we used to do that before introducing requirements.{in/txt}, the result was that from time to time presubmit broke without any clues and we needed to dig through the dependencies to find out why.

I just want to make sure we are not going in cycles, we should admit approaches have their pros and cons.

I think the discussion laid out in https://github.com/kubeflow/pipelines/issues/4682 is not significantly different from what we have here. Maybe the best approach is also to have a requirements.txt, but set up a bot to update it periodically as PRs. In this way, if that update PR fails, we know users might hit that problem too, but it won't be blocking presubmit tests (and other people not working on this problem).

chensun commented 3 years ago

However, we used to do that before introducing requirements.{in/txt}, the result was that from time to time presubmit broke without any clues and we needed to dig through the dependencies to find out why.

If I'm not mistaken, this is usually due to new versions of dependencies not compatible with other existing dependencies. And that's a sign that we need to fix setup.py by adding upper limit restrictions to the dependencies. Using requirements.txt in test is not solving the underlying issue but hiding it.

The fact that many of the dependencies listed in kfp setup.py don't have the upper version limit is problematic IMHO. https://github.com/kubeflow/pipelines/blob/9bc63f59b27886a84bfcf4ece1062d489d06e0f5/sdk/python/setup.py#L24-L47 So I think one action item is to add upper limits regardless whether we use requirements.txt in tests. WDYT?

EDIT: created https://github.com/kubeflow/pipelines/pull/5258

Bobgy commented 3 years ago

TODO: update TFX upgrade documentation

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

kubeflow / pipelines

[Testing] tfx sample pipeline broken in release not caught by postsubmit #5178

Conclusion