pangeo-data / swot_adac_ogcms

Documentation and notebooks for the SWOT Adopt-a-Crossover Model Intercomparison
Apache License 2.0
9 stars 3 forks source link

Data loading errors #12

Open yangleir opened 1 year ago

yangleir commented 1 year ago

Hello,

Thank the authors for opening these codes, giving one best example for learning the Pangeo Forge.

I meet some questions about loading data when run some of the Jupyter book.

The intake_demo.ipynb works with good with no error report. The ds = cat[item](**params).to_dask() can load the data well. The nice figures could be plotted.

But, in the other notebooks, I find authors use the code like:

w_path = f'{SCRATCH}/region01/HYCOM50/'+f'sigma0_fma.zarr'
sig0w = xr.open_zarr(gcs.get_mapper(w_path)).sig0.chunk({'lat':100,'lon':100})

or

fio01grid = xr.open_zarr(gcs.get_mapper(f"gcs://meom-ige-scratch/roxyboy/region01/FIO-COM32/grid.zarr"))

Then, these codes give errors:

group not found at path ''

or

Forbidden: https://storage.googleapis.com/download/storage/v1/b/meom-ige-scratch/o/roxyboy%2Fregion01%2FFIO-COM32%2Fgrid.zarr%2F.zmetadata?alt=media
pangeo-hubs-prod@pangeo-integration-te-3eea.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object.

So, what is reason for these error messages? Forgive me for my limited knowledge of the Pangeo Forge and Google Cloud.

And, what is the difference between cat.to_disk and the xr.open_zarr(gcs.get_mapper())? Why not use the cat.to_disk in all the codes, since it has been used in many Pangeo examples?

Regards Lei

roxyboy commented 1 year ago

Hi @yangleir, thank you for reaching out.

The path w_path = f'{SCRATCH}/... is our "local" scratch storage on our Google Cloud based JupyterHub so this directory is not public. I've tried to have all the notebooks necessary to reach the files stored on the scratch storage in this repo but if you've found a file that you'd want, please let me know :)

yangleir commented 1 year ago

Hi, @roxyboy Thank you very much for reply. I am from FIO ( in your model list :) ) and a PhD student of PO. So, the ocean models in your paper are not publicly available through the Google Cloud, and I can not run your notebook as it is right? But, I can load them though ds = cat[item](**params).to_dask(). Dose that mean I can still load all the data and use your codes based on some changes of the loading way?

roxyboy commented 1 year ago

The model outputs are saved on a public cloud storage managed by the National Science Foundation and not Google Cloud so I think you should be able to access all the model outputs listed in cat.

rabernat commented 1 year ago

Takaya, what is the difference between the data stored in OSN and the data stored in your private Google Cloud bucket?

I think we should be very clear about which parts of this code are reproducible by anyone and which require access to private data.

roxyboy commented 1 year ago

The data on the Google Cloud bucket are all post-processed data that I'd save as intermediate results or for plotting purposes.

I've pushed all the notebooks I used for the paper so all the results should be reproducible from the raw model outputs stored on OSN, but it is true that other people would have to also reproduce the intermediate results.

I would save intermediate results when the analyses were heavy to re-run.

rabernat commented 1 year ago

I would say that is not clear from either the repo README or the paper. So as long as those notebooks are up, we will continue to confuse people like @yangleir who think they should be able to simply run the notebooks to reproduce the results.

roxyboy commented 1 year ago

I can update the README to be more clear.

rabernat commented 1 year ago

It would be useful to explain how specifically one would re-create the intermediate data from the raw data. The goal is to make it clear how to go step-by-step from the raw data on OSN to the final figures in the paper.

roxyboy commented 1 year ago

I've started a PR #13 so maybe we could move the discussion there.

yangleir commented 1 year ago

Thank you all !!
That would be very helpful if the examples could be run without much difficulty.