pangeo-forge / eNATL60-feedstock

Pangeo Forge feedstock for eNATL60.
https://pangeo-forge.org/dashboard/feedstock/87
Apache License 2.0
0 stars 1 forks source link

No datasets have been produced after first feedstock run #2

Open andersy005 opened 1 year ago

andersy005 commented 1 year ago

Hi @andersy005, @cisaacstern ! So no datasets have been produced after the pull request has been merged, I do not know what went wrong : https://pangeo-forge.org/dashboard/feedstock/87, it only says that the status is failed ... Do you have any insights on this ? Is it possible to run it again ?

Originally posted by @auraoupa in https://github.com/pangeo-forge/staged-recipes/issues/189#issuecomment-1296722166

andersy005 commented 1 year ago

@auraoupa, The failures seem to be related to some connectivity issue

 Traceback (most recent call last):
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
      response = task()
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
      lambda: self.create_worker().do_instruction(request), request)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
      return getattr(self, request_type)(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
      bundle_processor.process_bundle(instruction_id))
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
      input_op_by_transform_id[element.transform_id].process_encoded(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
      self.output(decoded_value)
    File "apache_beam/runners/worker/operations.py", line 526, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 528, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 237, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1507, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
      config.storage_config.cache.cache_file(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
      _copy_btw_filesystems(input_opener, target_opener)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
      data = source.read(BLOCK_SIZE)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
      return super().read(length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
      out = self.cache._fetch(self.loc, self.loc + length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
      self.cache = self.fetcher(start, bend)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
      return sync(self.loop, func, *args, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
      raise return_result
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
      result[0] = await coro
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
      r = await self.session.get(self.url, headers=headers, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
      await resp.start(conn)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
      message, payload = await protocol.read()  # type: ignore[union-attr]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
      await self._waiter
  aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
  ,

    a6170692e70616e67656f2d66-10310800-yvxa-harness-wbm9
        Root cause: Traceback (most recent call last):
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
      config.storage_config.cache.cache_file(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
      _copy_btw_filesystems(input_opener, target_opener)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
      data = source.read(BLOCK_SIZE)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
      return super().read(length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
      out = self.cache._fetch(self.loc, self.loc + length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
      self.cache = self.fetcher(start, bend)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
      return sync(self.loop, func, *args, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
      raise return_result
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
      result[0] = await coro
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
      r = await self.session.get(self.url, headers=headers, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
      await resp.start(conn)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
      message, payload = await protocol.read()  # type: ignore[union-attr]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
      await self._waiter
  aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
      response = task()
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
      lambda: self.create_worker().do_instruction(request), request)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
      return getattr(self, request_type)(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
      bundle_processor.process_bundle(instruction_id))
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
      input_op_by_transform_id[element.transform_id].process_encoded(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
      self.output(decoded_value)
    File "apache_beam/runners/worker/operations.py", line 526, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 528, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 237, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1507, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
      config.storage_config.cache.cache_file(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
      _copy_btw_filesystems(input_opener, target_opener)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
      data = source.read(BLOCK_SIZE)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
      return super().read(length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
      out = self.cache._fetch(self.loc, self.loc + length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
      self.cache = self.fetcher(start, bend)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
      return sync(self.loop, func, *args, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
      raise return_result
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
      result[0] = await coro
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
      r = await self.session.get(self.url, headers=headers, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
      await resp.start(conn)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
      message, payload = await protocol.read()  # type: ignore[union-attr]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
      await self._waiter
  aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
  ,

    a6170692e70616e67656f2d66-10310800-yvxa-harness-wbm9
        Root cause: Traceback (most recent call last):
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
      config.storage_config.cache.cache_file(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
      _copy_btw_filesystems(input_opener, target_opener)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
      data = source.read(BLOCK_SIZE)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
      return super().read(length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
      out = self.cache._fetch(self.loc, self.loc + length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
      self.cache = self.fetcher(start, bend)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
      return sync(self.loop, func, *args, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
      raise return_result
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
      result[0] = await coro
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
      r = await self.session.get(self.url, headers=headers, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
      await resp.start(conn)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
      message, payload = await protocol.read()  # type: ignore[union-attr]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
      await self._waiter
  aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
      response = task()
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
      lambda: self.create_worker().do_instruction(request), request)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
      return getattr(self, request_type)(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
      bundle_processor.process_bundle(instruction_id))
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
      input_op_by_transform_id[element.transform_id].process_encoded(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
      self.output(decoded_value)
    File "apache_beam/runners/worker/operations.py", line 526, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 528, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 237, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1507, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
      config.storage_config.cache.cache_file(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
      _copy_btw_filesystems(input_opener, target_opener)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
      data = source.read(BLOCK_SIZE)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
      return super().read(length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
      out = self.cache._fetch(self.loc, self.loc + length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
      self.cache = self.fetcher(start, bend)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
      return sync(self.loop, func, *args, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
      raise return_result
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
      result[0] = await coro
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
      r = await self.session.get(self.url, headers=headers, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
      await resp.start(conn)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
      message, payload = await protocol.read()  # type: ignore[union-attr]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
      await self._waiter
  aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
  ,

    a6170692e70616e67656f2d66-10310800-yvxa-harness-wbm9
        Root cause: Traceback (most recent call last):
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
      config.storage_config.cache.cache_file(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
      _copy_btw_filesystems(input_opener, target_opener)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
      data = source.read(BLOCK_SIZE)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
      return super().read(length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
      out = self.cache._fetch(self.loc, self.loc + length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
      self.cache = self.fetcher(start, bend)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
      return sync(self.loop, func, *args, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
      raise return_result
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
      result[0] = await coro
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
      r = await self.session.get(self.url, headers=headers, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
      await resp.start(conn)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
      message, payload = await protocol.read()  # type: ignore[union-attr]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
      await self._waiter
  aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
      response = task()
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
      lambda: self.create_worker().do_instruction(request), request)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
      return getattr(self, request_type)(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
      bundle_processor.process_bundle(instruction_id))
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
      input_op_by_transform_id[element.transform_id].process_encoded(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
      self.output(decoded_value)
    File "apache_beam/runners/worker/operations.py", line 526, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 528, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 237, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1507, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
      config.storage_config.cache.cache_file(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
      _copy_btw_filesystems(input_opener, target_opener)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
      data = source.read(BLOCK_SIZE)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
      return super().read(length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
      out = self.cache._fetch(self.loc, self.loc + length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
      self.cache = self.fetcher(start, bend)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
      return sync(self.loop, func, *args, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
      raise return_result
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
      result[0] = await coro
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
      r = await self.session.get(self.url, headers=headers, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
      await resp.start(conn)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
      message, payload = await protocol.read()  # type: ignore[union-attr]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
      await self._waiter
  aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
timestamp: '2022-10-31T15:16:38.664727404Z'

This is most likely the remote server not being happy with multiple requests being done asynchronously ( i presume this is caused by dataflow's scaling...) Unfortunately, i don't know how to address this issue, but will let others chime in cc @rabernat / @martindurant / @yuvipanda / @alxmrs

alxmrs commented 1 year ago

At first I thought you may need rate limiting in the pipeline (something like https://github.com/google/weather-tools/blob/0322cac4d679c105999a96cf9c3fced71e4561ae/weather_mv/loader_pipeline/util.py#L291, Charles and I have discussed this before on a separate issue). However, from the trace, it looks like this is an issue with copying data from their filesystem to ours. I'm interested to hear other's thoughts on the matter.

martindurant commented 1 year ago

Note that async method like cat allow for a batch_size argument to control how many requests are sent at a time. However, here we are using the stateful file API, so parallelism is controlled entirely outside of fsspec.

auraoupa commented 1 year ago

Hi @andersy005 ! I think I know why there is connectivity issues on our opendap : there is a lot of traffic going on the same network (only one graphic card for transfer and computations ...). What I could do is booking the machine for a time slot so that the pangeo-forge operation to be done, could you indicate me a day and a time for which you would be able to launch it again ?

andersy005 commented 1 year ago

thank you for looking into this, @auraoupa! i'm available all day today and tomorrow, and would be happy to help with the new recipe runs. Ping me whenever you are ready for us to try again.

auraoupa commented 1 year ago

Ok great ! Actually today is a slow day on the machine, could you give it a try now ? Thanks @andersy005

auraoupa commented 1 year ago

Hi @andersy005, I did not give up on these recipes yet ! There have been some modifications on our opendap in order to fix the connectivity issues, could you give it a last try ? Then if it does not work I will find another place to host the data ... Thanks for your help !

auraoupa commented 1 year ago

Hi @andersy005 , @cisaacstern , @rabernat ! Sorry to be pushy, could you please give this recipe a last try ? If it still does not work I will create a new recipe with a different hosting opendap ... Thanks !

auraoupa commented 1 year ago

Hi @cisaacstern and @yuvipanda ! Would it be possible to try my recipe a last time so I know wether the opendap on which my data are currently hosted still has connectivity issues ? Thanks

cisaacstern commented 1 year ago

👋 Hi @auraoupa, thanks for being persistent here, and apologies for the (terribly) delayed reply. As you can see, Pangeo Forge (both the software, and the community) does not support time-sensitive requests particularly well. In part, this is a product of our very small maintainer pool, and also it is partially an assumption of the platform design that the public data we are pulling will be (more or less) "always available" ... this latter assumption breaks down of course, in the case of pulling from bandwidth-constrained sources.

All that being said, I am of course happy to trigger a re-run now, which I will do by opening (then merging) a PR which makes some arbitrary change to the code. (A merged PR is currently our only switch for triggering a new run.) We can check back on this issue when this new run completes.

And again, apologies for the tremendous delay and thank you for keeping us accountable here.

cisaacstern commented 1 year ago

@auraoupa, the deployments triggered by merging #5 have all failed, despite the pruned subset each of these recipes having just succeeded in the tests I ran from the discussion thread on #5 (which you can see there).

The errors I am seeing in the backend logs is consistent with above:

aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']

So this would still seem to be a concurrency/bandwidth issue with the source file server. Concurrency limiting is a valuable feature which we should have in Pangeo Forge, but simply have not had the developer time to build yet.

auraoupa commented 1 year ago

Thanks @cisaacstern for this test ! I guess I have to find another place to store the data then ... One idea though, do you think it could help if we tried just one of the 3 sub-recipes ? Or rearrange the files so that there is not so many of them ?

cisaacstern commented 1 year ago

@auraoupa, running just one of the sub-recipes is a good thought, though unfortunately is not currently supported. (This would be a good future feature to develop under the general heading of concurrency limits.)

Or rearrange the files so that there is not so many of them ?

This is a promising idea. The production run will make one request per file, so yes, reducing file count will also reduce concurrency. If files are too large, however, we run the risk of long-running transfers with dropped connections.

How many files (of what sizes) does each sub-recipe currently have?

As a general guideline, I'd say if we can reduce number of files by at least 5x without pushing per-file sizes over 10 GB, it's worth a shot.

auraoupa commented 1 year ago

Thanks @cisaacstern for the suggestions ! I will then make monthly files instead of daily ones and maybe submit a new recipe with only one dataset at a time, which will be 12 files (instead of 3x 365 files), around 8Gb each !

cisaacstern commented 1 year ago

@auraoupa, sounds good... this could conceivably work!

Perhaps this is clear, but in case not, please make the PR as an edit to the file feedstock/recipe.py in this repo (not to staged-recipes).

auraoupa commented 1 year ago

Hi @cisaacstern, I hope you had nice end of year and wish you the best for 2023 ! I rewrote the recipe so we can try with fewer files at a time in the pull request #6 , could you please merge it ? Thanks !

auraoupa commented 1 year ago

Hi @cisaacstern ! Last try for this recipe in the pull request #6 , this time the files are smaller than 2Gb each and there are 73 of them ...