Closed ranchodeluxe closed 7 months ago
I actually got the EVERTHING 🥳 working on the runner side using thepre-commit
API internally 😬 (that will change) but it seems to fail when submitting the job to the beam python SDK side of things 😞:
==================== <function read_to_impulse at 0x7f8099bcaf80> ====================
==================== <function impulse_to_input at 0x7f8099bcb010> ====================
==================== <function sort_stages at 0x7f8099bcb250> ====================
==================== <function add_impulse_to_dangling_transforms at 0x7f8099bcb370> ====================
==================== <function setup_timer_mapping at 0x7f8099bcb1c0> ====================
==================== <function populate_data_channel_coders at 0x7f8099bcb2e0> ====================
starting control server on port 36945
starting data server on port 38025
starting state server on port 40469
starting logging server on port 46333
Created Worker handler <apache_beam.runners.portability.fn_api_runner.worker_handlers.SubprocessSdkWorkerHandler object at 0x7f80999cbfd0> for environment ref_Environment_default_environment_1 (beam:env:harness_subprocess_python:v1, b'/home/ubuntu/venv_310/bin/python3 -m apache_beam.runners.worker.sdk_worker_main')
/home/ubuntu/venv_310/bin/python3: Error while finding module specification for 'apache_beam.runners.worker.sdk_worker_main' (ModuleNotFoundError: No module named 'apache_beam')
Exception in thread run_worker:
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/tmp/tmp72rct58f/prefix/py_env-default/lib/python3.10/site-packages/apache_beam/runners/portability/local_job_service.py", line 228, in run
raise RuntimeError(
RuntimeError: Worker subprocess exited with return code 1
Detected input queue delay longer than 300 seconds. Waiting to receive elements in input queue for instruction: bundle_1 for 300.05 seconds.
I actually got the EVERTHING 🥳 working on the runner side using the
pre-commit
API internally 😬 (that will change) but it seems to fail when submitting the job to the beam python SDK side of things 😞:
solved with a simple patch and have ideas about a "real" solution... see updated AC on main conversation above
fuckin love it
Attention: Patch coverage is 96.77419%
with 2 lines
in your changes are missing coverage. Please review.
Project coverage is 95.60%. Comparing base (
340dc74
) to head (19e5be8
). Report is 2 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
pangeo_forge_runner/commands/bake.py | 96.22% | 2 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@ranchodeluxe If I understand correctly, before this PR, recipe library dependencies were included in pipelines vie extra_options (via these lines: https://github.com/pangeo-forge/pangeo-forge-runner/blob/main/pangeo_forge_runner/commands/bake.py#L255-L264) and with this PR we are installing the recipe's library dependencies directly on the runner using venvception
which ensures the libraries used by the runner are the same as specified by the recipe.
If that understanding is correct, should we include pangeo-forge-recipe versions in the integration test workflows (e.g. https://github.com/pangeo-forge/pangeo-forge-runner/blob/gcorradini/infer_deps/.github/workflows/flink.yaml#L32) since they should be inferred from the requirements.txt in the test recipe? (e.g.: https://github.com/pangeo-forge/pangeo-forge-runner/blob/gcorradini/infer_deps/tests/test-data/gpcp-from-gcs/feedstock-0.10.x-flink/requirements.txt). I'm actually not sure how the command line argument --recipes-version is working such that https://github.com/pangeo-forge/pangeo-forge-runner/blob/gcorradini/infer_deps/pangeo_forge_runner/commands/bake.py#L188 isn't failing.
@ranchodeluxe If I understand correctly, before this PR, recipe library dependencies were included in pipelines vie extra_options (via these lines: https://github.com/pangeo-forge/pangeo-forge-runner/blob/main/pangeo_forge_runner/commands/bake.py#L255-L264) and with this PR we are installing the recipe's library dependencies directly on the runner using
venvception
which ensures the libraries used by the runner are the same as specified by the recipe.
That's mostly correct except that not all of them were listed via extra_options
. For example, pangeo-forge-recipes
was never in there but is required to run bake
as well as fsspec
. So there were "hidden deps"
If that understanding is correct, should we include pangeo-forge-recipe versions in the integration test workflows (e.g. https://github.com/pangeo-forge/pangeo-forge-runner/blob/gcorradini/infer_deps/.github/workflows/flink.yaml#L32) since they should be inferred from the requirements.txt in the test recipe? (e.g.: https://github.com/pangeo-forge/pangeo-forge-runner/blob/gcorradini/infer_deps/tests/test-data/gpcp-from-gcs/feedstock-0.10.x-flink/requirements.txt)
Yeah, I assume we still want to include them there so we get parallel CI runs to test against.
I'm actually not sure how the command line argument --recipes-version is working such that https://github.com/pangeo-forge/pangeo-forge-runner/blob/gcorradini/infer_deps/pangeo_forge_runner/commands/bake.py#L188 isn't failing.
It's hard to see (I blame pytest
😆 ) but there are dep injected fixtures in tests/conftest.py
that do the passing of args such as --recipes-version
for us. I'm positive those are working b/c I can break them by changing code in the matching tests/test-data/<recipe-version>/recipe.py
code. Plus the previous merge to main
should already be using some form of these but I'm cleaning them up here
Addresses https://github.com/pangeo-forge/pangeo-forge-runner/issues/154
Needs https://github.com/pangeo-forge/pangeo-forge-runner/pull/170
Goal
Developers 'should' only need to
pip install pangeo-forge-runner
(and the implicit deps inpyproject.toml
). When they run a recipe everything that's needed to run the recipe is in the reciperequirements.txt
including:apache-beam
,fsspec
providers and any extra packages needed for their transformsNext Steps
POC works for
DirectRunner
in beam:[x] We have virtualenv library we are using called
venvception
that does all the path magic andsys.executable
magic. It works withDirectRunner
[x]
Instead of patchingNope, we're gonna patch this since we have to patch paths anyhow 😏sys.exectuable
to force beam'sFniApiRunner
to sniff the correct executable we should try to sublcass and override the correct incantation of classes as mentioned here[x] integration tests