This repository contains a sample project using CML with DVC to push/pull data from cloud storage and track model metrics. When a pull request is made in this repository, the following will occur:
python train.py
)The key file enabling these actions is .github/workflows/cml.yaml
.
In this example, .github/workflows/cml.yaml
contains three environmental variables that are stored as repository secrets.
Secret | Description |
---|---|
GITHUB_TOKEN | This is set by default in every GitHub repository. It does not need to be manually added. |
AWS_ACCESS_KEY_ID | AWS credential for accessing S3 storage |
AWS_SECRET_ACCESS_KEY | AWS credential for accessing S3 storage |
AWS_SESSION_TOKEN | Optional AWS credential for accessing S3 storage (if MFA is enabled) |
DVC works with many kinds of remote storage. To configure this example for a different cloud storage provider, see our documentation on the CML repository.
Note that if you clone this project, you will have to configure your own DVC storage and credentials for the example. We suggest the following procedure:
python get_data.py
to generate your own copy of the dataset. After initializing DVC in the project directory and configuring your remote storage, run dvc add data
and dvc push
to push your dataset to remote storage.git add
, commit
and push
to push your DVC configuration to GitHub..github/workflows/cml.yaml
from this repository to your fork. By default, workflow files are not copied in forks. When you commit this file to your repository, the first workflow should be initiated.