zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.45k stars 273 forks source link

Keys not URL-decoded when loaded over the network #2076

Open aeisenbarth opened 1 month ago

aeisenbarth commented 1 month ago

Zarr version

v2.17.1

Numcodecs version

v0.12.1

Python Version

3.10

Operating System

Linux

Installation

Using pip into a conda environment

Description

When files are served over the network, the server must encode certain characters using percent-encoding (RFC3986 2.2). When Zarr opens a dataset from a URL, keys are incorrectly set from percent-encoded file names.

Steps to reproduce

Create a dataset containing any of (: /) ? # [ ] @ ! $ & ' ( ) * + , ; =. Here, the array key contains +.

import numpy as np
import zarr

g = zarr.open_group("dataset.zarr")
g.create_dataset(name="a+b", data=np.eye(3))

Serve the dataset with a local server. Go into the directory where you saved the data and run:

python -m http.server

In a web browser you can confirm that the URLs are correctly percent-encoded, but the file listing is decoded:

Try reading the dataset from a URL:

>>> g = zarr.open("http://0.0.0.0:8000/dataset.zarr/")
>>> list(g.keys())
['a%2Bb']

Additional output

No response