pangeo-forge / pangeo-forge-runner

Run pangeo-forge recipes on Apache Beam
https://pangeo-forge-runner.readthedocs.io
Apache License 2.0
8 stars 9 forks source link

Infer Runner Deps from Recipe requirements.txt #154

Closed ranchodeluxe closed 7 months ago

ranchodeluxe commented 10 months ago

Problem

I apologize for the range of this question b/c I got here through some Flink runner frustrations. For the runner we currently have a bunch of unlisted runtime dependencies that need to be installed to execute pangeo-forge-runner bake command:

# the storage look up and dependency injections require some form of these deps 
fsspec  # or possibly s3fs, gsfs depending on what's used in config
pangeo-forge-recipes
apache-beam  # which has to match the exact python and beam versions for the producer and worker (Flink specific)

Possible Solution

AFAIK for pangeo-forge-recipes>=0.10.0 we require a requirements.txt to be in the feedstock. So I want to build on (#130) and ask if there any reason we can't do something automated (but slightly gross) like this really basic example diff?

Thoughts?

ranchodeluxe commented 10 months ago

If the above solution is too gross (which I understand) another option is to write a bunch of validation (most of it just Flink specific) that checks things like:

sbquinlan commented 10 months ago

I had this issue recently. Here's how you can easily get into a messed up state:

conda create env -n pangeo_test -c conda-forge pangeo-forge-recipes
pip install "pangeo-forge-runner[extras]"

pangeo-forge-runner has an optional dependency of apache-beam[gcp] which last exists in v2.42.0. pip will install that version over the latest version that pangeo-forge-recipes installs from conda-forge (v2.51.0 at time of writing). When you run a FlinkDeployment using the runner, it'll correctly build the FlinkDeployment with whatever container_image you specify, but the pipeline.run() call will fail because apache-beam will default to the old version v2.42.0 which isn't compatible with the latest Flink version.

I think it's important here to either pass down a specific jar in the pipeline options or validate that the apache-beam version installed is similar to what is being configured in the FlinkDeployment.

cisaacstern commented 10 months ago

Sorry for the delayed response here.

I certainly agree that client/runner version mismatch is a huge headache and we should do something about that.

To understand, is this issue in a way a version of https://github.com/pangeo-forge/pangeo-forge-runner/issues/27, in which @yuvipanda proposed having a separate fetcher/environment provisioner?

ranchodeluxe commented 10 months ago

To understand, is this issue in a way a version of https://github.com/pangeo-forge/pangeo-forge-runner/issues/27, in which @yuvipanda proposed having a separate fetcher/environment provisioner?

No, this example diff isn't intended to be a solution in that direction. But I guess the question is that ticket (which seems to be over a year old and referring to the old "orchestrator" idea) still the vision? First time seeing it

cisaacstern commented 10 months ago

But I guess the question is that ticket (which seems to be over a year old and referring to the old "orchestrator" idea) still the vision?

This is a good point for discussion. Regarding references to the "orchestrator", I think we can view #27 as if that does not refer to pangeo-forge/pangeo-forge-orchestrator, and rather means just "little o" orchestrator, as in the person or whom/whatever is calling pangeo-forge-runner to deploy jobs.

One idea mentioned in #27 which definitely feels worthwhile to me, is that if we are going to approach automatic environment modification in any form, that should definitely be happening in temporary venvs of some sort, as is (IIUC) done in pre-commit, tox, nox, etc.

Over in the GitHub Action, I do a version of what your example diff is proposing:

https://github.com/pangeo-forge/deploy-recipe-action/blob/868f7dd2bd02ec506c7c822f958813373ebc751b/action/deploy_recipe.py#L115-L118

Which feels appropriate to me because the Action is only ever called within the context of an ephemeral GitHub Action container, so we are not going to be modifying (human) user environments on-the-fly with that code.

In pangeo-forge-runner itself, though, modifying the same environment that the CLI is invoked in feels innappropriate, and could conceivably lead to even more confusion, insofar as users environments are being changed underneath them.

I would be curious to understand more about how pre-commit, tox, nox etc. manage ephemeral venvs, and if there is something we can borrow from any of those projects on this topic.

ranchodeluxe commented 10 months ago

In pangeo-forge-runner itself, though, modifying the same environment that the CLI is invoked in feels innappropriate, and could conceivably lead to even more confusion, insofar as users environments are being changed underneath them.

I would be curious to understand more about how pre-commit, tox, nox etc. manage ephemeral venvs, and if there is something we can borrow from any of those projects on this topic.

Agreed, let me start looking into options down this pathway

yuvipanda commented 10 months ago

YESSSS I THINK STEALING WHAT pre-commit DOES IS EXACTLY THE RIGHT WAY TO GO HERE

yuvipanda commented 7 months ago

fucking amazing