kubeflow / examples

A repository to host extended examples and tutorials
Apache License 2.0
1.41k stars 756 forks source link

[object_detection] presubmit/postsubmit test for training job #231

Closed jlewi closed 5 years ago

jlewi commented 6 years ago

It would be great to have an E2E test to ensure object detection training is working.

This could just be an Argo Workflow that runs a single step to invoke training for a couple steps and make sure it runs for a couple of steps.

We can probably start with or repurpose https://github.com/kubeflow/tf-operator/blob/master/py/test_runner.py

As a test runner. That python program submits a TFJob based on a ksonnet spec and then runs some checks before producing the output files.

We could run the E2E tests against our dev cluster that is running the most recent stable version of Kubeflow. This way our E2E tests don't have to provision new cluster.

/cc @ldcastell

ldcastell commented 6 years ago

I think is a great idea. would this be an automated process? or will it be manually triggered?

jlewi commented 6 years ago

It would be automated; just like our other tests. See: https://github.com/kubeflow/testing#adding-an-e2e-test-for-a-new-repository

We have the infra to do this. We just need the code to do this as an E2E test e.g.

  1. A python program (e.g. test_runner) to trigger the job and verify it ran
  2. A ksonnet Argo workflow to be used as the E2E test.
ldcastell commented 6 years ago

Cool, I think I can start looking into it.

jlewi commented 5 years ago

@idcastell @hougangliu any interest in taking this on?

456 provides an E2E test for the TFJob used by the GitHub Issue summarization example. That could be used as model for doing the same for the object detection example.

In addition here are some other resources:

hougangliu commented 5 years ago

@jlewi I can take it on

hougangliu commented 5 years ago

/assign

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.