leap-stc / data-and-compute-team

Repo to organize issues/mangagment of the LEAP Data and Computation Team
Apache License 2.0
1 stars 0 forks source link

Experiment with kbatch #4

Open SammyAgrawal opened 2 months ago

SammyAgrawal commented 2 months ago

kbatch notes

kbatch docs

What we tested

What we are missing

SammyAgrawal commented 2 months ago
  1. Verified the basic hello.py script Yuri sent
  2. Verified can run externally hosted quay images kbatch job submit --name=my-image-job --image=$MY_CUSTOM_IMAGE --command='["python", "my_image.py"]' --code="my_image.py" -o name
  3. In addition to writing to cloud buckets, how to write local files (e.g. user directory). This seems important for things like model checkpoints, logs, or output jupyter notebooks from papermill?

my_image.py:

import diffusers
import torch
import torch.nn as nn
print(diffusers)

Next step is running in conjunction with papermill (#5 )

edit: succeeded in running with papermill, issue now is how to save results. In general if script writes to file, unclear how that file can be retrieved. Is writing to cloud buckets the only way or can we somehow access user directories?

SammyAgrawal commented 2 months ago

Unclear how to pass in multiple code files (script as well as jupyter notebook)

kbatch job submit --name=my-papermill-test --image=$MY_CUSTOM_IMAGE --command='["python", "papermill_test.py"]' --code="papermill_test.py" -o name ^^ base command that works but fails because papermill_test.py itself requires access to another file, notebook_test.ipynb.

Tried: --code="papermill_test.py" --code "notebook_test.ipynb" Failed with: python: can't open file '/code/papermill_test.py': [Errno 2] No such file or directory Failed because: calling --code twice simply overwrites the previous flag so is equal to just sending notebook_test

Tried: --code='["papermill_test.py", "notebook_test.ipynb"]' Failed with: FileNotFoundError: [Errno 2] No such file or directory: '["papermill_test.py", "notebook_test.ipynb"]', Faled because: does not even run the job because interprets the whole string as one filename

Tried: --code="papermill_test.py", "notebook_test.ipynb" and --code=["papermill_test.py", "notebook_test.ipynb"] Failed with: Error: Got unexpected extra argument (notebook_test.ipynb]) Failed because: only takes in one argument, even though docs say can pass in list

edit: Solved! to pass in multiple code files, you must make a directory with everything you want to send and send that instead. --code="test_file_dir"