pangeo-forge / gpcp-feedstock

A Pangeo Forge Feedstock for gpcp.
Apache License 2.0
3 stars 2 forks source link

Test on Dataflow #3

Closed cisaacstern closed 2 years ago

cisaacstern commented 2 years ago

As tracked in https://github.com/pangeo-forge/gpcp-feedstock/issues/2, we're seeing scheduler memory issues on Prefect for this recipe, which is disappointing!

I'm curious if this would succeed on Dataflow, but not immediately sure how to easily route production runs there, so starting with a PR (labeled dev, to route to Dataflow) to see if a recipe test will succeed there with the version of the to_beam compiler present in 0.8.3.

pangeo-forge-bot commented 2 years ago

I'm having trouble finding your recipe module (i.e. Python file) in this PR.

Your meta.yaml recipes section currently includes a recipe declared as:

- id: gpcp
    object: recipe:recipe

The object here should conform to the format {recipe-module-name}:{recipe-object-name}.

In your PR I only see the following files:

['feedstock/meta.yaml']

...none of which end with recipe.py, which is unexpected given the object shown above.

Please help me find your recipe module by either:

cisaacstern commented 2 years ago

Oh I need to actually make a change to the recipe to get it visible to Pangeo Forge Cloud...

pangeo-forge-bot commented 2 years ago

:tada: New recipe runs created for the following recipes at sha 4794d968c7d455bd6eef857864a5821a56bf4817:

Note: This PR is deployed to Pangeo Forge Cloud's dev backend, for which a full frontend website in not currently available. The links below therefore point to plain text information about the created recipe run(s).

cisaacstern commented 2 years ago

/run recipe-test recipe_run_id=55

pangeo-forge-bot commented 2 years ago

:tada: New recipe runs created for the following recipes at sha d1b98dedb34fde420e83bafedc96f6ce48555975:

Note: This PR is deployed to Pangeo Forge Cloud's dev backend, for which a full frontend website in not currently available. The links below therefore point to plain text information about the created recipe run(s).

cisaacstern commented 2 years ago

/run recipe-test recipe_run_id=56

pangeo-forge-bot commented 2 years ago

:sparkles: A test of your recipe gpcp is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

Note: This test is deployed to Pangeo Forge Cloud's dev backend, for which public logs are not yet available.

cisaacstern commented 2 years ago

This is running on Dataflow 🤞

Screen Shot 2022-07-13 at 4 02 37 PM
pangeo-forge-bot commented 2 years ago

:partying_face: Hooray! The test execution of your recipe gpcp succeeded.

Here is a static representation of the dataset built by this recipe:

``` Dimensions: (latitude: 180, nv: 2, longitude: 360, time: 2) Coordinates: lat_bounds (latitude, nv) float32 ... * latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 87.0 88.0 89.0 lon_bounds (longitude, nv) float32 ... * longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 356.0 357.0 358.0 359.0 * time (time) datetime64[ns] 1996-10-01 1996-10-02 time_bounds (time, nv) datetime64[ns] ... Dimensions without coordinates: nv Data variables: precip (time, latitude, longitude) float32 ... Attributes: (12/45) Conventions: CF-1.6, ACDD 1.3 Metadata_Conventions: CF-1.6, Unidata Dataset Discovery v1.0, NOAA ... acknowledgment: This project was supported in part by a grant... cdm_data_type: Grid cdr_program: NOAA Climate Data Record Program for satellit... cdr_variable: precipitation ... ... standard_name_vocabulary: CF Standard Name Table (v41, 22 February 2017) summary: Global Precipitation Climatology Project (GPC... time_coverage_duration: P1D time_coverage_end: 1996-10-01T23:59:59Z time_coverage_start: 1996-10-01T00:00:00Z title: Global Precipitation Climatatology Project (G... ```

You can also open this dataset by running the following Python code

import fsspec
import xarray as xr

dataset_public_url = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/staging/recipe-run-56/pangeo-forge/gpcp-feedstock/gpcp.zarr'
mapper = fsspec.get_mapper(dataset_public_url)
ds = xr.open_zarr(mapper, consolidated=True)
ds

in this badge (or your Python interpreter of choice).

Checklist

Please copy-and-paste the list below into a new comment on this thread, and check the boxes off as you've reviewed them.

Note: This test execution is limited to two increments in the concatenation dimension, so you should expect the length of that dimension (e.g, "time" or equivalent) to be 2.

- [ ] Are the dimension lengths correct?
- [ ] Are all of the expected variables present?
- [ ] Does plotting the data produce a plot that looks like your dataset?
- [ ] Can you run a simple computation/reduction on the data and produce a plausible result?
pangeo-forge-bot commented 2 years ago

:tada: New recipe runs created for the following recipes at sha 92af477c50278efb92b5e4bb4e4c5bdb83ca7dbb:

Note: This PR is deployed to Pangeo Forge Cloud's dev backend, for which a full frontend website in not currently available. The links below therefore point to plain text information about the created recipe run(s).

cisaacstern commented 2 years ago

@rabernat, here's an update:

  1. The test deployment to dataflow worked 🥳
  2. I realized that I don't actually have a way to selectively route production runs to Dataflow. The notion of marking releases as beta was a holdover from when we thought we'd deploy production runs from tag events. Now that we're deploying production runs from push events to main, that selector feature has gone stale. As a workaround, in https://github.com/pangeo-forge/gpcp-feedstock/pull/3/commits/92af477c50278efb92b5e4bb4e4c5bdb83ca7dbb I've forced all push events for this feedstock to carry the is_beta flag, which should route them to Dataflow.
  3. I'll now merge this, and we should see the production run go to Dataflow. I'll open a separate issue to track that production run, because Dataflow logs are not yet available via pangeo-forge.org.

I'd say this merge closes #2.