pangeo-forge / pangeo-forge-orchestrator

Database API and GitHub App backend for Pangeo Forge Cloud.
https://api.pangeo-forge.org/docs
Apache License 2.0
4 stars 1 forks source link

Add kerchunk opener to `repr` route #200

Closed cisaacstern closed 11 months ago

cisaacstern commented 1 year ago

The recently-added https://github.com/pangeo-forge/liveocean-feedstock had a successful production run:

https://pangeo-forge.org/dashboard/recipe-run/1391?feedstock_id=91

Our xarray repr generation route doesn't support kerchunk, so instead of a dataset repr the frontend is displaying:

status code: 404
An error occurred while fetching data from URL: https://api.pangeo-forge.org/repr/xarray/?url=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/liveocean-feedstock/liveocean.zarr
{"detail":"An error occurred while fetching the data from URL: https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/liveocean-feedstock/liveocean.zarr. Dataset not found."}

Let's add some conditional logic for that rather than assuming all datasets are zarr, here:

https://github.com/pangeo-forge/pangeo-forge-orchestrator/blob/fb5caaf6994d1b73004b447ab00e5687ab941f2e/pangeo_forge_orchestrator/routers/repr.py#L24-L25

xref #198 #199

cc @rsignell-usgs this is why the frontend is not displaying the dataset repr for liveocean. But I think https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/liveocean-feedstock/liveocean.zarr should be your full dataset (despite the inaccurate .zarr suffix, which is because of #199)?

katamartin commented 1 year ago

πŸ‘‹ gentle bump on this issue!

It looks like this is now affecting the first two recipes on the frontend, which makes it fairly prominent.

cisaacstern commented 1 year ago

Thanks for the ping @katamartin.

@andersy005, noticed your πŸ‘, do you have interest in + bandwidth to work on this?

andersy005 commented 1 year ago

@cisaacstern, this is on my TODO list, but won't be able to work on it until later next week :)

cisaacstern commented 1 year ago

Sounds great. I'll assign you here, @andersy005. From my perspective, no stress re: timeline. Of course lmk if you end up needing feedback/review on this.

cisaacstern commented 1 year ago

@katamartin, out of curiosity, does next/vercel offer the possibility of running any type of integration testing against a live instance of our backend application? For every PR, we have the option of deploying a review instance of the backend, and also merges to main deploy to a staging instance.

In #226, I am working on an integration test wiring these instances to Google Dataflow, as part of debugging #220. Similarly, should we consider some scheduled tests of the frontend against these dev/staging instances of the backend? I am imaging some future where we require checks such as:

βœ… Dataflow integration passing
βœ… Next.js frontend integration passing

Before we deploy to prod. (Anderson maybe you have some insight on how to do this too. When we spoke the other week, I think we agreed that frontend integration testing was not the highest priority, but here we have a prime example of when it would have helped already πŸ˜… .)

katamartin commented 1 year ago

@cisaacstern ooh can definitely see how that would be useful here! Unfortunately, not sure that there are great out-of-the-box options.

As I see it, we would need to wire up deploys for the pangeo-forge.org project to this repo and get those frontend deploys to point to the backend preview deploy URLs. It looks like there's definitely no Vercel tooling for the latter (https://github.com/vercel/vercel/discussions/5231) and I'm not sure how well-supported the former would be either.