nextstrain / nextstrain.org

The Nextstrain website
https://nextstrain.org
GNU Affero General Public License v3.0
87 stars 49 forks source link

fetch: allow colons (:) in dataset URLs #702

Open joverlee521 opened 11 months ago

joverlee521 commented 11 months ago

Context

The colon (:) is reserved as separator for the dual tree display, so we explicitly split all paths by colon.

This results in 404 errors when loading URLs with colons through /fetch

For example, Nextclade dataset reference trees with timestamps in the URL cannot load since the URL gets truncated: https://nextstrain.org/fetch/data.clades.nextstrain.org/datasets/flu_h3n2_ha/references/EPI1857216/versions/2023-04-02T12:00:00Z/files/tree.json

This behavior was reported by @corneliusroemer on Slack.

tsibley commented 10 months ago

A workaround already exists via double-encoding, e.g.

https://nextstrain.org/fetch/data.clades.nextstrain.org/datasets/flu_h3n2_ha/references/EPI1857216/versions/2023-04-02T12%253a00%253a00Z/files/tree.json

However, Auspice (I believe) updates the URL after load and ends up only singly-encoding colons so the displayed URL isn't usable as is (e.g. reloading the page doesn't work). We could potentially update Auspice to DTRT here when it gets an encoded colon.

jameshadfield commented 3 months ago

This results in 404 errors when loading URLs with colons through /fetch

To allow : in fetch paths whilst also supporting tanglegrams such as /fetch/data.nextstrain.org/measles_genome.json:fetch/data.nextstrain.org/measles_N450.json seems like it'd take a lot of error prone path checking code. Double encoding them is a nice solution.

Auspice (I believe) updates the URL after load and ends up only singly-encoding colons so the displayed URL isn't usable as is (e.g. reloading the page doesn't work). We could potentially update Auspice to DTRT here when it gets an encoded colon.

Yeah, Auspice is doing a single round of URL decoding. Happy for changes to Auspice to be proposed here. BTW Auspice logs this change, e.g. for your link above:

Pathname for "fetch/data.clades.nextstrain.org/datasets/flu_h3n2_ha/references/EPI1857216/versions/2023-04-02T12%253a00%253a00Z/files/tree.json" changing to "fetch/data.clades.nextstrain.org/datasets/flu_h3n2_ha/references/EPI1857216/versions/2023-04-02T12%3a00%3a00Z/files/tree.json".