Improve dataset URLs as defined within narratives

jameshadfield commented 4 years ago

Narratives currently define a dataset URL in their YAML frontmatter and the title for each page.

The current implementation is ad-hoc:

We only uses the pathname -- i.e. running locally but specifying nextstrain.org/ebola will actually source the localhost dataset. This is because all auspice requests are made to the server the client's running from & only use the dataset pathname.
the domain needs to be localhost or nextstrain.org (this restriction should definitely be removed).
Only the frontmatter dataset URL is used -- for the titles of each page we only use the URL query to modify the original dataset. One day it'd be great to change datasets (and this was the originally intention), but this requires wider changes to auspice.

Note that narratives are designed so that any narrative can refer to any dataset (within the same domain). E.g. a nextstrain community narrative could source the core nCoV build.

This has become confusing for narrative authors as we build more narratives which rely on non-core datasets. For instance, whilst building a narrative for the community dataset nextstrain.org/community/scratch/ncov-demo, the file was stored locally as scratch_ncov-demo.json. This meant that the narrative URLs had to change between local development and deployment, from:

dataset: "https://localhost:4000/scratch/ncov-demo"

to

dataset: "https://nextstrain.org/community/jameshadfield/scratch/ncov-demo?d=tree,map"

(Confusingly the domain could be nextstrain or localhost for either, see above).

This problem exists for nextstrain groups as well.

alliblk commented 4 years ago

@jameshadfield: some comments on Nextstrain URLs that I've been running into. I realize that this might be a lot of things (or some of these might be more substantial changes to the code base) but this seemed like a good place to put them.

It would be nice if there was someway to source relative paths for the URLs in the markdown files. I'm usually developing using localhost links, but needing to change them when I push to github (or deploy to nextstrain.org). While that isn't hugely onerous, it does offer up a lot of chances to make mistakes/not fix a URL, which is something I find myself doing frequently and then getting JSONs not found errors.
Would eventually be awesome if you could source multiple datasets. I'm thinking of this in terms of being able to refer to a current build and a previous one to compare them. How easy would it be in the "dataset" portion of the editor to allow multiple datasets?
I think it would be amazing if you could make your own "dashboard" community page in the event that you have multiple narratives pertaining to the same project that you'd like to group together. While I realize that you can just name the files "wrong" (i.e. don't have any files that are named the same as the repo), this produces the list AND a big red 404 error. It would be really nice to just get rid of the error message here. As a current workaround I've made 5 different branches for each narrative that I want, and then on each branch the narrative I want displaying there is called .md (and same for the auspice JSONs as well). While this is an okay workaround (I have 5 URLs that include dates that take me to the correct narratives I want to display) it feels a little quirky to have different files named the same thing but on different branches.

tsibley commented 4 years ago

I talked about this with both of you separately, but I wanted to concur here that I think relative paths to datasets in the frontmatter and relative paths in each page's dataset URL would solve a lot of the issues. Particularly, it'd be possible, without changing any paths or adding any special handling to Auspice, to preview community or group narratives locally before pushing/uploading them.

It also makes the three weirdnesses James described in the top post at lot less weird:

A relative path in the frontmatter dataset key makes it clear that the domain doesn't matter.
The domain won't be present, so doesn't have to be a specific set of values.
The URLs for each path can include just the query component, which makes it really clear that the dataset isn't changing, just the view params. For example, # [Map View](?d=map) and # [Looping Animation](?animate=2009-01-22,2017-07-22,1,0,15000&d=map). When we do support multiple datasets, it'll be clear when the dataset changes and when it doesn't because page URLs will reflect that.

One thing to figure out is how a narrative refers to its default or "natural" dataset complement, which I think is important for community/group narratives but also some core narratives like ncov. Definitely welcome input and thoughts here. Particularly, I think it'd be useful for a community narrative on a specific branch to be able to refer to the dataset on that same branch, without explicitly naming the branch. This would allow, among other things, a nice convention for versioning.

What I mean by default or natural dataset is:

narrative url	default/natural dataset url
/community/narratives/blab/ebola-narrative-ms	/community/blab/ebola-narrative-ms
/community/narratives/blab/ebola-narrative-ms@sitrep-2019-10-15	/community/blab/ebola-narrative-ms@sitrep-2019-10-15
/narratives/ncov/sit-rep/2020-01-30	/ncov/sit-rep/2020-01-30

One way the narrative could indicate this is by omitting the dataset key from its frontmatter, in which case the default/natural dataset is based on the narrative's path. Another way may be to make it explicitly dataset: ..

jameshadfield commented 4 years ago

Relative paths seem like a good idea. @tsibley @alliblk could you help me understand what you have in mind here 👇

narrative pathname	YAML dataset	dataset pathname (i.e. `getDataset` prefix)
/narratives/x	`./y`	/y
/groups/abc/narratives/x	`./y`	/groups/abc/y
/community/narratives/user/repo[@branch]	`./repo`	/community/user/repo
/community/narratives/user/repo[@branch]	`./y`	/community/user/repo/y
/community/narratives/user/repo[@branch]/x	`./y`	/community/user/repo/x/y ???
/narratives/z	`/ncov`	/ncov
/groups/abc/narratives/z	`/ncov`	/ncov

@tsibley One thing to figure out is how a narrative refers to its default or "natural" dataset complement

This seems like a nice idea, and seems to follow on from allowing relative links. As we add more features like this we just need to make sure we have very clear documentation.

@alliblk could you elaborate on your point 3 - I don't quite understand. Is this at all what you mean: https://nextstrain.org/community/jameshadfield/scratch/

@tsibley The URLs for each path can include just the query component

This does have some appeal. Originally, I designed the markdown with the intention that it's viewable as a markdown by itself (i.e. the headers would be valid links to what the paragraph describes), however relative URLs would also move us away from this original intention.

@alliblk How easy would it be in the "dataset" portion of the editor to allow multiple datasets?

Not easy 😢

alliblk commented 4 years ago

@jameshadfield Yes, https://nextstrain.org/community/jameshadfield/scratch/ is totally what I'd like to have and what I was trying to convey in point 3 :)

jameshadfield commented 4 years ago

Awesome! If you need help just let me know. P.S. for this to work you can't have a default dataset -- i.e. in my example there's no scratch.json dataset.

tsibley commented 4 years ago

@jameshadfield

Thanks for starting that decision table. One thing is that the @branch should be preserved in the dataset path when present in the narrative path. I think that's what you meant, but the dataset path column doesn't reference it like the narrative path column.

Tinkering with the table to think through this a bit more and get at the unclear cases, I think there's a fundamental tension between:

We want to use the same path handling behaviour that is ubquitious in computing, as we've been talking about. I think this is really non-negotiable if we're going to support relative paths.
But, Nextstrain paths (for datasets and narratives) act like both directories and files at the same time, which is problematic. The way web servers usually deal with this is by appending a trailing slash for directories, which makes relative path handling sane.

As we add more features like this we just need to make sure we have very clear documentation.

Agreed!

@alliblk could you elaborate on your point 3 - I don't quite understand. Is this at all what you mean: https://nextstrain.org/community/jameshadfield/scratch/

However, compare: https://nextstrain.org/community/jameshadfield/scratch/ to https://nextstrain.org/community/narratives/jameshadfield/scratch/.

This does have some appeal. Originally, I designed the markdown with the intention that it's viewable as a markdown by itself (i.e. the headers would be valid links to what the paragraph describes), however relative URLs would also move us away from this original intention.

Ah, I see. It's nice for the links to work in a narrative-naive Markdown render, but seems like it would also be nice not to require parts of the URL which are ignored by Auspice. I do think support for relative, query-only URLs would be additional to the current support for absolute URLs, so this would then be a decision by the narrative author.

nextstrain / auspice

Improve dataset URLs as defined within narratives #890