Open DennisHeimbigner opened 5 years ago
@DennisHeimbigner - this dataset is on GCS but may work for you: https://storage.googleapis.com/pangeo-data/ecco/eccov4r3/
This prefix is readable https://storage.googleapis.com/pangeo-data so it may work. Thanks.
There's also a fixture
directory in this repo, which has some dummy data for testing/validating the format/spec is still met during testing. This may be useful for seeding your own S3 bucket. Though it is quite small.
Unfortunately, these do not appear to reside on S3 itself.
Right. This data is in GCS. Perhaps @jacobtomlinson knows of a public s3 zarr out there?
No, but I can make one if you like?
That would be helpful if you did. It does not have to be complex, I am just trying to get the basic access correct.
The S3 example in the zarr tutorial uses a very small toy dataset that is publicly accessible. Bucket is here: http://zarr-demo.s3-eu-west-2.amazonaws.com/
Would there be any interest in having a https://www.minio.io/ -based setup using docker within travis so that s3 tests could be run? This would carry a s3fs requirement at least at the testing scope.
Edit: Looks like gh-293 may either make this unnecessary or be a good template for adding this for a AWS clone.
Would there be any interest in having a https://www.minio.io/ -based setup using docker within travis so that s3 tests could be run? This would carry a s3fs requirement at least at the testing scope.
Edit: Looks like gh-293 may either make this unnecessary or be a good template for adding this for a AWS clone.
Sorry for slow follow up here. I think this would be excellent. I had been concerned that the cloud storage class implementations that are not within the zarr code base were not getting put through the test suite, but this would solve that very nicely. I think #293 provides a template, but it would need a new PR to add test coverage for AWS S3 via s3fs.S3Map
.
Also I noticed recently that GCS has support now for local emulation, so it should be possible to get something for GCS too via gcsfs.GCSMap
. That could be done separately from the open PR to implement a GCS storage class via the official Python SDK (#252), which would be nice to finish but is a parallel piece of work.
GCS has support now for local emulation
how? where? I'd love to see it. I think I saw this mentioned elsewhere.
To @joshmoore , you don't need minio, you can more easily use moto, which is what the s3fs tests use.
Re emulation, sorry I think I got confused, I had seen this page about emulation for Google Cloud Datastore but of course that's something completely different from Google Cloud Storage.
you don't need minio, you can more easily use moto, which is what the s3fs tests use.
Thanks, @martindurant. I hadn't seen moto
before. Happy to have the tests use whatever's appropriate in this repo, especially if mocking is preferred to integration tests. For me, the minio setup is also useful for more production testing. Would you also suggest using moto in server mode for that?
I don't see why not. Moto lacks some rather specific features such as file versioning, but is pretty complete. minio also isn't exactly S3...
The S3 example in the zarr tutorial uses a very small toy dataset that is publicly accessible. Bucket is here: http://zarr-demo.s3-eu-west-2.amazonaws.com/
We are currently implementing an S3 backend for our Julia zarr package https://github.com/meggart/ZarrNative.jl/commits/S3storage . I wanted to ask if it is ok to use the dataset you mention here for our unit tests?
Yes of course. Also happy to give you write access and/or put more test datasets there if it would be useful.
On Mon, 15 Apr 2019, 18:22 Fabian Gans, notifications@github.com wrote:
The S3 example https://zarr.readthedocs.io/en/stable/tutorial.html#distributed-cloud-storage in the zarr tutorial uses a very small toy dataset that is publicly accessible. Bucket is here: http://zarr-demo.s3-eu-west-2.amazonaws.com/
We are currently implementing an S3 backend for our Julia zarr package https://github.com/meggart/ZarrNative.jl/commits/S3storage . I wanted to ask if it is ok to use the dataset you mention here for our unit tests?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/zarr/issues/385#issuecomment-483195469, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QjhqB7W90cnwI-XEA846csAHkEIOks5vhFLVgaJpZM4Z0TqS .
@alimanfoo Regarding this S3 example, what is the file format of the zaar-demo data? I've tried placing a .zarr file (directory) on S3, and I am having issues accessing it.
@mhearne-usgs : see also https://github.com/martindurant/zarr/pull/1/files for an example of following @martindurant's moto suggestion.
I am in the process of constructing th initial netcdf-c library handler for the Zarr format. As part of this, I need to verify my assumptions about the mapping of the storage to S3. Are there any anonymously accessible zarr datasets that I can access (read-only)?