Open cisaacstern opened 2 years ago
@cisaacstern I have been searching for the issue which details this but can't seem to locate it. In our discussions with the Prefect team there are 2 versioning compatibility issues to account for. The first is the Python version used for flow serialization, this must match the Python version used on the container where the Prefect agent is running for flow deserialization to function properly. The second is the Prefect Agent version. According to the Prefect support team this should be backward compatible so the Prefect Agent will support flows created with any =< version.
I'd like to find a canonical link to Prefect documentation or an issue discussion which confirms this so we can reference it here.
Dynamically setting the image for KubernetesRun
and the DaskExecutor
is what happens in pangeo-forge-prefect
currently for AKS bakeries so I can confirm this should be working, what is the error you are seeing in your environment. See here and here for examples of this dynamic registration.
@cisaacstern For reference this is a previous comment where I discussed some of our image versioning concerns https://github.com/pangeo-forge/roadmap/issues/18#issuecomment-840901574
Thanks for weighing in, @sharkinsspatial. I now have a version of this idea working in https://github.com/pangeo-forge/registrar/pull/12.
what is the error you are seeing in your environment
I was seeing an version incompatibility problem in the worker container (the specifics might not matter, but it was an s3fs
+ aiobotocore
mismatch). Upgrading fsspec/s3fs/gcsfs in the 0.5.0
worker image seems to have solved this; xref https://github.com/pangeo-forge/pangeo-forge-bakery-images/pull/28.
I am now unsure if this was actually an image incompatibility problem, or rather that the prior release of the 0.5.0
image was itself broken, and required an fsspec/s3fs/gcsfs upgrade just to work at all. These two scenarios are hard to disambiguate now that I have the following three images
graph LR;
registrar --> agent --> worker;
all using fsspec/s3fs/gcsfs == 2011.11.0
.
What I'm not clear on, however, is what would happen if one or multiple of the above images had different fsspec/s3fs/gcsfs versions than the others. We will encounter this scenario quite soon, though, because recent releases of pangeo-forge-recipes
use later versions of fsspec et al than 2011.11.0
.
Dynamically selecting an versioned Docker image for a given recipe execution seems quite important. We currently have multiple released versions of
pangeo-forge-recipes
, and even if current recipe developers only use the latest version, that doesn't solve the problem of released feedstocks having been built with specific older versions. We always need to be able re-run all released feedstocks.As a concrete example, our released NOAA OISST feedstock was released with
pangeo-forge-recipes==0.5.0
, which is already many versions behind the bakery image (pinned topangeo-forge-recipes==0.6.1
) that was used to deploy our current bakery:My question for @tracetechnical and @sharkinsspatial is, should we expect that, if at the time of registration we change the value of
image
in theKubernetesRun
andDaskExecutor
...https://github.com/pangeo-forge/pangeo-forge-gcs-bakery/blob/2d164b4dacb0353d1df11ebf996493e5108bc7bf/test/recipes/oisst_recipe.py#L69-L71
https://github.com/pangeo-forge/pangeo-forge-gcs-bakery/blob/2d164b4dacb0353d1df11ebf996493e5108bc7bf/test/recipes/oisst_recipe.py#L76-L80
...that this will create workers which can execute recipes with a different environment from the Prefect Agent?
As far as I can tell, the only place that the
BAKERY_IMAGE
used to deploy the bakery is actually used is in the Prefect Agent deployment spechttps://github.com/pangeo-forge/pangeo-forge-gcs-bakery/blob/2d164b4dacb0353d1df11ebf996493e5108bc7bf/kubernetes/prefect-agent.deployment.yaml#L50
...so dynamically setting the
image
for theKubernetesRun
andDaskExecutor
should be possible? I've been experimenting with this for ... a few hours now 🙃 ... and not quite able to get it to work yet, so just wanted to check that I wasn't overlooking something obvious.cc @rabernat @jhamman