iterative / terraform-provider-iterative

☁️ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes
https://registry.terraform.io/providers/iterative/iterative/latest/docs
Apache License 2.0
287 stars 27 forks source link

📚 Epic: Cloud data sync for not-dvc scenarios #211

Open dmpetrov opened 2 years ago

dmpetrov commented 2 years ago

Goal: recover deep learning jobs, minimize data sync for reusable machines (#209)

Cloud data sync - all data syncs through a cloud directly (S3, etc). This scenario does not include direct data sync - from the user's laptop to a cloud instance.

First, we need research on the best practice of data syncing. Open questions:

  1. Do we need a file watcher?
  2. DVC or rclone or ...?
0x2b3bfa0 commented 2 years ago

🔔 @dmpetrov & @iterative/cml, is this relevant after the iterative_task resource?

casperdcl commented 2 years ago

yes it is.

0x2b3bfa0 commented 2 years ago

efficiency (try to sync diffs rather than entire files)

Same company, two DVC implementations? 🤔

0x2b3bfa0 commented 2 years ago

workspace awareness (skip uploading [...])

dacbd commented 2 years ago

to me the documentation that describes the copying behavior of task's storage reads: with workdir = '.' everything gets copied to the instance, but only files written into ./somedir given output = 'somedir' get copied back?