huggingface / Google-Cloud-Containers

Including Hugging Face Deep learning Containers for Google Cloud
Apache License 2.0
112 stars 10 forks source link

CI Pipeline which builds & tests the container #4

Open philschmid opened 7 months ago

philschmid commented 7 months ago

To make sure our Hugging Face DLC are well tested, we need to create "integration" tests that run different kinds of training using the container. Those tests should be run automatically or on-demand. We can use Github Actions as CI for running the tests and python + docker to implement the integration tests.

Until #3 is implemented, we can use existing Containers from, e.g. transformers to run the tests. For "tests" script, i think we can use existing "examples/" from transformers or peft trl. We could structure the tests/ folder maybe into:

Example for a test:

  1. build a container
  2. starts a container on a GPU
  3. runs a training using the container (few steps)
  4. validates results
  5. stops the container -> repeat 1-4. with other tests.

In addition to "local" tests running on GPU instances, we should also run validation tests for GKE and Vertex AI.

philschmid commented 7 months ago

For access to GCP you can ask @glegendre01.