ijiraq / gemini_processing

scripts and stuff to process gemini data witin the CADC.
1 stars 1 forks source link

Run the container in batch mode #15

Closed ijiraq closed 3 years ago

ijiraq commented 4 years ago

We need to run a few 100 jobs with this processing. To do this we can build a VM that has docker on it and then run these jobs as docker run jobs on that VM.

sfabbro commented 4 years ago

Making the VM and run docker run should work as a hack short term solution, but not long term. Here are some quick thoughts onto what to check:

ijiraq commented 4 years ago
  • the container image has to be on the VM image (not on a separate volume)

Could the VM pull the container from Docker at run time? It might be quick enough?

sfabbro commented 4 years ago
  • the container image has to be on the VM image (not on a separate volume)

Could the VM pull the container from Docker at run time? It might be quick enough?

A script on the VM could pull the image at run time but:

Nat1405 commented 4 years ago

Hi all, I'm just playing around in Batch and noticed I have a year-old VM in there (not sure how it got there). @dbohlender does this belong to you? I'm just checking before I delete it.

image

dbohlender commented 4 years ago

Yes, go ahead and delete it. About once a year I try to use a VM to generate some non-LTE model atmospheres and spectrum synthesis. And then forget how to do it again...

Nat1405 commented 4 years ago

Perfect, thank you!! (Ha, sounds like a job for a container...........)

dbohlender commented 4 years ago

Yes, exactly!!!

Nat1405 commented 4 years ago
  • the container image has to be on the VM image (not on a separate volume)

Could the VM pull the container from Docker at run time? It might be quick enough?

A script on the VM could pull the image at run time but:

  • the container registry is not on the same network, and the batch system would pull hundreds of images simultaneously
  • we would have to make sure there is only one image per VM while there may be several jobs on the VM.

@sfabbro thanks for your help. Could you elaborate on why there should only be one image per VM? Is that just to limit unnecessary downloads of the container image(say, from DockerHub)?

Nat1405 commented 4 years ago

Also to make sure I'm not just going down a rabbit hole, @ijiraq @dbohlender I'm following the instructions here for setting up Batch jobs. Seem like a good place to start?

dbohlender commented 4 years ago

Sorry, was preparing for and then attending management meeting. Those were the instructions that I followed, but only to batch processing. I've not yet done any batch work.

sfabbro commented 4 years ago

@sfabbro thanks for your help. Could you elaborate on why there should only be one image per VM? Is that just to limit unnecessary downloads of the container image(say, from DockerHub)?

It is not an absolute necessity, but it will save you space given that you only have 20G and docker images are large, especially if you use anaconda. So the root file system fills up quickly even with the shared layers niceness of docker. If you want to run several simultaneous jobs per VM you will have also to make sure there one job pulling a docker image does not conflict with another job downloading the same docker image on the same VM. Anyway if you really want to do docker run yourself on the VM, have the image pulled before snapshotting the VM to avoid unnecessary downloads.

ijiraq commented 4 years ago

@sfabbro: good insights to docker/containers/images. I guess in the 'docker run' only the first image that gets loaded to a VM will actually pull from docker-hub, the others will be from the cache. To make the pull from docker-hub minimal a good root image will help, with just minimal nifty bits coming in at launch time.

But there are things to think about to make this not an abuse of network.

ijiraq commented 4 years ago

Also to make sure I'm not just going down a rabbit hole, @ijiraq @dbohlender I'm following the instructions here for setting up Batch jobs. Seem like a good place to start?

please do follow, and if you find wrong stuff in there take notes so we can update the manual.

Nat1405 commented 4 years ago

First run of the container on batch mode has happened; now time for tweaking. Added rough notes on my work so far in d5eebcfd33c715.