htcondor / htmap

High-Throughput Computing in Python, powered by HTCondor
https://htmap.readthedocs.io
Apache License 2.0
32 stars 10 forks source link

Launch interactive jobs for development/debugging #204

Open stsievert opened 4 years ago

stsievert commented 4 years ago

Is your feature request related to a problem? Please describe. I have a job that requires a GPU. It requires a GPU and specifies a Docker image. I need to debug on this image.

Currently, my solution is launch an EC2 machine, copy my files over then start developing/debugging.

Describe the solution you'd like I method to land an interactive job on this image. Something like https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#interactive-jobs

Describe alternatives you've considered

stsievert commented 4 years ago

I would image that HTMap has a submit file it uses internally. Could that submit file be used to generate a debugging/development submit file? I think I'd like something with this:

import htmap
options = {...}
future = htmap.map(..., map_options=htmap.MapOptions(**options))
future.debugging_submit_file

future.debugging_submit_file would specify my Docker image and transfer all the Python files HTMap needs. It'd hopefully have a comment detailing how to submit with condor_submit.

This would enable debugging on HTCondor without needing to rent an EC2 instance to debug (my current solution). Would that be possible?

JoshKarpel commented 4 years ago

I really like this idea! But, I want to be careful about how we implement it. It would be possible to generate the submit description for a single component, but I'd prefer a solution that "keeps you inside Python", since the intent is to wrap up the low-level HTCondor operations behind Python(ic) APIs.

I'm thinking of something like...

htmap.interactive(func, args, kwargs, map_options=...)

which would then connect you to the job (i.e., put you in a shell) once it starts running. I'll ask around about interactive submits and condor_ssh_to_job and see what's possible.

stsievert commented 4 years ago

I'd prefer a solution that "keeps you inside Python", ... I'm thinking of something like..

It'd be great to launch the interactive job from Python! That'd remove a lot of the HTCondor details.

I primarily use these interactive jobs for developing a single script, and would use bash on this remote machine to run the script over and over. I'd probably use it like this:

submit2:~ $ ls
launch.py  finished.py  train.py
submit2:~ $ python
>>> import htmap
>>> htmap.interactive(map_options=...)
# hangs while job launches
remote-machine:~ $ ls
launch.py  finished.py  train.py
remote-machine:~ $ python train.py
# make edits to train.py
remote-machine:~ $ python train.py
remote-machine:~ $ exit
>>> # back on submit2