ploomber / soopervisor

☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.
https://soopervisor.readthedocs.io
Apache License 2.0
45 stars 18 forks source link

error in cp_ploomber_home #63

Closed edublancas closed 2 years ago

edublancas commented 2 years ago

it replaces the .tar.gz instead of adding the new files, added a test case in the cp-ploomber-home branch

edublancas commented 2 years ago

I disabled the call to cp_ploomber_home until we find a fix. I found that Python doesn't support appending to tar.gz files https://stackoverflow.com/a/30984372/709975

idomic commented 2 years ago

It does under a certain flags.. Play with it a bit. How's the tests passing if the files aren't there?

On Sun, Jan 30, 2022, 12:38 PM Eduardo Blancas @.***> wrote:

I disabled the call to cp_ploomber_home until we find a fix. I found that Python doesn't support appending to tar.gz files https://stackoverflow.com/questions/30981165/add-files-to-tar-in-python

— Reply to this email directly, view it on GitHub https://github.com/ploomber/soopervisor/issues/63#issuecomment-1025192366, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYPJOPWHFLPDX2TDX3QH5TUYVZSLANCNFSM5NDYT43Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were assigned.Message ID: @.***>

edublancas commented 2 years ago

I tried changing the flags, it doesn't work (check the SO link), it's not supported.

the files passed because the only test that we have checks that the config files are there, but it doesn't check that the rest of the source code is there - I added one new test to check that

idomic commented 2 years ago

So why do we want this as part of the tar and not simply in the folder like it was before the change? the files did get copied to a dir and then into the image. It's either that or unzipping and zipping the tar file

edublancas commented 2 years ago

Yes, I think that's the way to go.

A few notes:

  1. the function that copies the user's code into the image copies all the project's file tree so anything under it will go to the docker image
  2. the env_name/ directory contains config files that aren't needed for running the pipeline in docker, the fact that it ends up in the docker image is a consequence of the previous point, but that's fine, we can use this as kind of "staging" area for preparing the files to copy

My suggestion is that we should create a temporary directory within env_name/ with the files that we want to put in the docker image. so if the project looks like this:

env_name/
    Dockerfile

During soopervisor export will look like this:

env_name/
  Dockerfile
  tmp/
     # put everything here - for now it'll only have a copy of  ~/.ploomber/stats
     .ploomber/config/stats

Do not copy ~/.ploomber/examples, since that will make the image unnecessarily larger

After creating the tar file, we delete the tmp/:

env_name/
  Dockerfile
edublancas commented 2 years ago

fixed this