samsoe / mpg_aerial_survey

0 stars 1 forks source link

Machine image updates #15

Open samsoe opened 1 year ago

samsoe commented 1 year ago
mosscoder commented 1 year ago

startup env needs:

pip3 install --upgrade pip
sudo apt-get install gdal-bin libgdal-dev libspatialindex-dev
samsoe commented 1 year ago

Ultimately I think it makes sense to have the pip3 upgrade and gdal pieces in the template machine image. However, now that we have a working array, I'm leaning toward maintaining the current instances until after the spurge surveys are complete. For an immediate fix I propose addressing installing pip and gdal with startup.py. This would trigger the installations on each start up which is a bit redundant but minor.

mosscoder commented 1 year ago

Yes, I agree the longterm plan would be to manage the environment with Conda, specifying exact versions of each dependency so that our pipeline does not breakdown if a dependency introduces breaking changes. Right now there is a small risk they could pull the rug out from under us in this way.

pip and gdal install/update are currently addressed in post_process.py.

Let's wait until July to address this. Basically, I pip freeze to get a list of working dependencies, export to a .yaml. I'll add to the yaml as we add functionality. The parent for all child images should have conda installed and the survey env established. We activate survey env in startup.py.

mosscoder commented 1 year ago

After learning about Compute Engine, this is the direction we'll want to go. It will greatly simplify the workflow: https://cloud.google.com/hpc-toolkit/docs/quickstarts/slurm-cluster

Furthermore, the machines are CPU, not memory limited. We will want to shift to compute optimized machines (with faster cores) when we refactor to use Compute Engine.