Open Castronova opened 3 years ago
Same issue here.
I've also tried the example in README without success.
The browsable catalog on pangeo.io seems to be affected too.
Sorry for the slow reply. Thanks for reporting these errors.
I am unable to reproduce this error. On https://staging.us-central1-b.gcp.pangeo.io/ with intake version 0.6.2, I was able to run
from intake import open_catalog
cat = open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml")
without errors.
I've also tried the example in README without success.
The README was using an old style intake syntax. The preferred usage is open_catalog
. In #126 I have updated the README.
The browsable catalog on pangeo.io seems to be affected too.
I'm not seeing any problems right now.
I wonder if this was a github glitch. @Castronova could you try again?
Can confirm the browsable catalog works.
However, running a fresh install of intake version 0.6.2, I still get an error running your snippet:
from intake import open_catalog
cat = open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml")
🤔 I'm confused. Maybe @martindurant has some ideas?
What version of fsspec is this?
import fsspec
print(fsspec.__version__)
# -> 2021.04.0
Was running fsspec
version 0.9.0.
After upgrading to 2021.04.0 everything works as expected!
Thanks
Sorry for the friction!
Martin, can you help me understand the source of this error better? I don't understand why fsspec is involved here. And since it is involved, why is a compatible version not a required dependency of intake?
Note that the following is equivalent and probably more likely to succeed on all version
cat = intake.open_catalog("github://pangeo-data:pangeo-datastore@/intake-catalogs/master.yaml")
@rabernat : in older versions of fsspec, files that were smaller than a blocksize were always downloaded in the go and thereafter read from an in-memory BytesIO (but the error would have show up for larger files). This bypassed any chance for file caching. It was changed in 0.9.0 to use the standard fetching mechanism. Unfortunately, that makes use of the apparent file size. Github report the file size of the gzipped version of the file, which is smaller than the real size, so you only get part of the file. In 2021.04.1, we explicitly ask for the size without compression ("Accept-Encoding=identity"), which gets the right value. Arguably, Intake should use fs.cat
instead of open().read()
, making fewer assumptions.
I'm getting an error when loading the intake catalog as described in https://catalog.pangeo.io/ and https://github.com/pangeo-data/pangeo-datastore/blob/master/README.md.
gives the following exception:
Any help is greatly appreciated.