pangeo-forge / staged-recipes

A place to submit pangeo-forge recipes before they become fully fledged pangeo-forge feedstocks
https://pangeo-forge.readthedocs.io/en/latest/
Apache License 2.0
39 stars 63 forks source link

WIP: ssebop #262

Open thodson-usgs opened 1 year ago

thodson-usgs commented 1 year ago

name: Recipe about: Demonstrating pangeo forge pipeline to USGS. title: SSEBOP

Dataset

SSEBOP is an evapotranspiration dataset covering CONUS at 1km2-daily resolution.

thodson-usgs commented 1 year ago

Hmm, I was following the instructions from the README, but perhaps I should have created the issue before the PR

thodson-usgs commented 1 year ago

I've so far been unable to run this using pange-forge-runner --prune with Direct Runner. The run spits out lots of output ending with grpc.FutureTimeoutError. Not sure if this a problem with a recipe, my environment, or my hardware.

norlandrhagen commented 1 year ago

Hey @thodson-usgs, taking a look at running your recipe locally. Are you on the ESIP slack?

thodson-usgs commented 1 year ago

@norlandrhagen, Yes, I'll follow up with you there.

For the record, you identified an error in my recipe; however, my run crashes before that point so I probably need to take a closer look at my configuration file.

Thanks!

norlandrhagen commented 1 year ago

Nice fix! I'm now running into:

AttributeError: 'ZipExtFile' object has no attribute 'size' [while running 'Create|OpenURLWithFSSpec|Preprocess|StoreToZarr/OpenURLWithFSSpec/MapWithConcurrencyLimit/open_url (max_concurrency=1)']

thodson-usgs commented 1 year ago

Interesting, I'd been getting an error about opening the zip, but not that one. In general, I've been testing on the several environments on hand: Ubuntu on WSL2, ESIP-nebari, and HPC. Each one gives a unique error...smells like an environment issue.

Next steps:

  1. I'll set max_concurrency=1 and if that fails avoid fsspec entirely and open the zip url directly with rioxarray.
  2. Try this all on a clean Ubuntu machine.

One question, what type of system are you testing with?

And thank you again, @norlandrhagen

norlandrhagen commented 1 year ago

Ah strange! Happy to help further. I'm on an m1 mac. I'm creating a conda/mamba env and installing pangeo-forge-recipes there + rioxarray.

thodson-usgs commented 1 year ago

Progress, I don't understand why OpenURLWithFSSpec failed (this all worked fine when I tested with fsspec), but I can open the zipped TIFs directly from rioxarray.

Now I get AttributeError: 'Dataset' object has no attribute 'encode' [while running 'Create|Preprocess|StoreToZarr/Preprocess/Map(_preproc)']

Maybe it's time to wade a bit deeper into Beam...

thodson-usgs commented 12 months ago

I changed one line and now the recipe runs without error.

def _preproc(item: Indexed[T]) -> Indexed[T]: to def _preproc(item: Indexed[T]) -> Indexed[xr.Dataset]:

At the next pangeo-forge meeting I'll follow up on why fsspec didn't work.

ranchodeluxe commented 10 months ago

sorry about that title change foobar ☝️ @thodson-usgs 😆 I am going to try to run this on my cluster as a data point and was creating a ticket of a similar name in a different tab

thodson-usgs commented 10 months ago

@ranchodeluxe, this recipe was a bit of a test point for us as well. USGS has a legacy of zipping tiffs, and I was demonstrating that pangeo-forge could handle that pattern. We did get it working, but it might have exposed another bug (https://github.com/pangeo-forge/pangeo-forge-recipes/issues/659). And then I got sidetracked working on the flink runner. Feel free to close this.