pangeo-forge / pangeo-forge-runner

Run pangeo-forge recipes on Apache Beam
https://pangeo-forge-runner.readthedocs.io
Apache License 2.0
8 stars 9 forks source link

Infer Runner Deps from Recipe Requirements #166

Closed ranchodeluxe closed 7 months ago

ranchodeluxe commented 8 months ago

Addresses https://github.com/pangeo-forge/pangeo-forge-runner/issues/154

Needs https://github.com/pangeo-forge/pangeo-forge-runner/pull/170

Goal

Developers 'should' only need to pip install pangeo-forge-runner (and the implicit deps in pyproject.toml). When they run a recipe everything that's needed to run the recipe is in the recipe requirements.txt including: apache-beam, fsspec providers and any extra packages needed for their transforms

Next Steps

POC works for DirectRunner in beam:

ranchodeluxe commented 8 months ago

I actually got the EVERTHING 🥳 working on the runner side using thepre-commit API internally 😬 (that will change) but it seems to fail when submitting the job to the beam python SDK side of things 😞:

==================== <function read_to_impulse at 0x7f8099bcaf80> ====================
==================== <function impulse_to_input at 0x7f8099bcb010> ====================
==================== <function sort_stages at 0x7f8099bcb250> ====================
==================== <function add_impulse_to_dangling_transforms at 0x7f8099bcb370> ====================
==================== <function setup_timer_mapping at 0x7f8099bcb1c0> ====================
==================== <function populate_data_channel_coders at 0x7f8099bcb2e0> ====================
starting control server on port 36945
starting data server on port 38025
starting state server on port 40469
starting logging server on port 46333
Created Worker handler <apache_beam.runners.portability.fn_api_runner.worker_handlers.SubprocessSdkWorkerHandler object at 0x7f80999cbfd0> for environment ref_Environment_default_environment_1 (beam:env:harness_subprocess_python:v1, b'/home/ubuntu/venv_310/bin/python3 -m apache_beam.runners.worker.sdk_worker_main')
/home/ubuntu/venv_310/bin/python3: Error while finding module specification for 'apache_beam.runners.worker.sdk_worker_main' (ModuleNotFoundError: No module named 'apache_beam')
Exception in thread run_worker:
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/tmp72rct58f/prefix/py_env-default/lib/python3.10/site-packages/apache_beam/runners/portability/local_job_service.py", line 228, in run
    raise RuntimeError(
RuntimeError: Worker subprocess exited with return code 1
Detected input queue delay longer than 300 seconds. Waiting to receive elements in input queue for instruction: bundle_1 for 300.05 seconds.
ranchodeluxe commented 8 months ago

I actually got the EVERTHING 🥳 working on the runner side using thepre-commit API internally 😬 (that will change) but it seems to fail when submitting the job to the beam python SDK side of things 😞:

solved with a simple patch and have ideas about a "real" solution... see updated AC on main conversation above

yuvipanda commented 8 months ago

fuckin love it

codecov[bot] commented 7 months ago

Codecov Report

Attention: Patch coverage is 96.77419% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 95.60%. Comparing base (340dc74) to head (19e5be8). Report is 2 commits behind head on main.

Files Patch % Lines
pangeo_forge_runner/commands/bake.py 96.22% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #166 +/- ## ========================================== + Coverage 95.53% 95.60% +0.07% ========================================== Files 14 15 +1 Lines 493 501 +8 ========================================== + Hits 471 479 +8 Misses 22 22 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

abarciauskas-bgse commented 7 months ago

@ranchodeluxe If I understand correctly, before this PR, recipe library dependencies were included in pipelines vie extra_options (via these lines: https://github.com/pangeo-forge/pangeo-forge-runner/blob/main/pangeo_forge_runner/commands/bake.py#L255-L264) and with this PR we are installing the recipe's library dependencies directly on the runner using venvception which ensures the libraries used by the runner are the same as specified by the recipe.

If that understanding is correct, should we include pangeo-forge-recipe versions in the integration test workflows (e.g. https://github.com/pangeo-forge/pangeo-forge-runner/blob/gcorradini/infer_deps/.github/workflows/flink.yaml#L32) since they should be inferred from the requirements.txt in the test recipe? (e.g.: https://github.com/pangeo-forge/pangeo-forge-runner/blob/gcorradini/infer_deps/tests/test-data/gpcp-from-gcs/feedstock-0.10.x-flink/requirements.txt). I'm actually not sure how the command line argument --recipes-version is working such that https://github.com/pangeo-forge/pangeo-forge-runner/blob/gcorradini/infer_deps/pangeo_forge_runner/commands/bake.py#L188 isn't failing.

ranchodeluxe commented 7 months ago

@ranchodeluxe If I understand correctly, before this PR, recipe library dependencies were included in pipelines vie extra_options (via these lines: https://github.com/pangeo-forge/pangeo-forge-runner/blob/main/pangeo_forge_runner/commands/bake.py#L255-L264) and with this PR we are installing the recipe's library dependencies directly on the runner using venvception which ensures the libraries used by the runner are the same as specified by the recipe.

That's mostly correct except that not all of them were listed via extra_options. For example, pangeo-forge-recipes was never in there but is required to run bake as well as fsspec. So there were "hidden deps"

If that understanding is correct, should we include pangeo-forge-recipe versions in the integration test workflows (e.g. https://github.com/pangeo-forge/pangeo-forge-runner/blob/gcorradini/infer_deps/.github/workflows/flink.yaml#L32) since they should be inferred from the requirements.txt in the test recipe? (e.g.: https://github.com/pangeo-forge/pangeo-forge-runner/blob/gcorradini/infer_deps/tests/test-data/gpcp-from-gcs/feedstock-0.10.x-flink/requirements.txt)

Yeah, I assume we still want to include them there so we get parallel CI runs to test against.

I'm actually not sure how the command line argument --recipes-version is working such that https://github.com/pangeo-forge/pangeo-forge-runner/blob/gcorradini/infer_deps/pangeo_forge_runner/commands/bake.py#L188 isn't failing.

It's hard to see (I blame pytest 😆 ) but there are dep injected fixtures in tests/conftest.py that do the passing of args such as --recipes-version for us. I'm positive those are working b/c I can break them by changing code in the matching tests/test-data/<recipe-version>/recipe.py code. Plus the previous merge to main should already be using some form of these but I'm cleaning them up here