leap-stc / climsim_feedstock

Apache License 2.0
0 stars 0 forks source link

Issues with Virtualizarr on Climsim feedstock #10

Open SammyAgrawal opened 2 months ago

SammyAgrawal commented 2 months ago

Getting some sort of strange error internally when virtualizarr uses fsspec. Works when I manually use ffsspec though; might be way in which their filesystem loads it.

sample_url = 'https://huggingface.co/datasets/LEAP/ClimSim_low-res-expanded/resolve/main/train/0001-02/E3SM-MMF.mlo.0001-02-02-06000.nc'
with fsspec.open(url, mode='rb').open() as file: 
    ds1 = xr.open_dataset(file, use_cftime=True, chunks={}) # works
    with open("file.nc", 'wb') as f:
        f.write(file.read()) # save local "file.nc" 
from virtualizarr import open_virtual_dataset
vds1= open_virtual_dataset("file.nc") # works
vds2 = open_virtual_dataset(sample_url) # fails

From parsing the error stack trace, the problem seems to be when fsspec.filesystem(protocol, **storage_options).open(filepath) is called internally. The eventual get request fails causing a FileNotFoundError for the url

SammyAgrawal commented 2 months ago
import requests
resp = requests.get(sample_url)
with open('file2.nc', 'wb') as f:
    f.write(resp.content)
open_virtual_dataset("file2.nc")

This works, so the issue definitely isn't with the url as some inherently inaccessible thing. But the above FileNotFoundError comes from fsspec/implementations/http.py:435, in HTTPFileSystem._info(self, url, **kwargs)