zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.52k stars 282 forks source link

Zip sample files for reduced file listing #1246

Open joshmoore opened 2 years ago

joshmoore commented 2 years ago

see: https://zenodo.org/record/7174882#.Y2LRxOzMIeY

In order to simplify the listing of the source code, an option (to be discussed) would be to zip all of the sample files. The unit tests could then unzip them to a temporary location.

See also:

sharmadhiraj86 commented 1 year ago

Hii @joshmoore, I would like to solve this issue. Kindly assign this issue to me if possible.

joshmoore commented 1 year ago

Two quick thoughts, @sharmadhiraj86:

Thanks!

sharmadhiraj86 commented 1 year ago

Hii @joshmoore, Can you please guide and share specifically which all files to zip?

MSanKeys963 commented 1 year ago

Hi @sharmadhiraj86, thanks for showing interest in this issue. Let me explain what we're trying to do here:

Currently, the preview on Zenodo of the files under zarr-python repository is not good, as there is .zarr data in the fixture folder. Ideally, the preview should display all of the files in zarr-python especially the zarr folder as it contains the actual code.

The fixture folder contains .zarr data which is used when we run the command python -m pytest -v zarr to run the existing tests under /zarr/tests against the Zarr codebase and see if any of those tests are failing and fix them afterwards.

@joshmoore suggestion is to zip all the existing .zarr data under the fixture folder so that the preview shows all the files, including the source code. The second part is to write a simple unit test which would unzip the data under fixture to a temporary location other than the root, use the data during testing and then clean it up after all the tests are green.

Also, we're not sure if zipping the data under fixture is preferred by everyone in the Zarr community, including the core devs, and we'd like to discuss that once the PR is submitted.

Please feel free to send the PR, which would cover the zipping of data and the unit tests to unzip afterwards. The tests would ideally be under /zarr/tests. Let me know if there are any more questions.

CC: @jakirkham

joshmoore commented 1 year ago

:+1: for @MSanKeys963's description, but for others, a question:

would this be the time to consider using pooch like xarray? https://github.com/pydata/xarray/pull/4102 etc.