nextstrain / auspice

Web app for visualizing pathogen evolution
https://docs.nextstrain.org/projects/auspice/
GNU Affero General Public License v3.0
291 stars 162 forks source link

Narrow the scope of open sharing acknowledgement inclusion #1809

Open victorlin opened 1 month ago

victorlin commented 1 month ago

This text:

This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge their contributions.

is added to any dataset on https://nextstrain.org without a description as a result of these lines:

https://github.com/nextstrain/auspice/blob/ca3dc6f5ee164f14cea20b7dceff268800a19caf/src/components/framework/footer.js#L171-L176

I don't think it applies to all datasets served by https://nextstrain.org, especially not for groups/community/fetch datasets which are not maintained by us.

Possible solutions

  1. ⛔️ Include only a hardcoded list of prefixes (e.g. /ncov, /mpox, all core datasets)
  2. ⛔️ Update the condition to be exclusionary based on path prefixes (e.g. /groups, /community, /fetch)
  3. Ensure all of our own datasets (coreBuildPaths) have a description field in their metadata and remove the hardcoding entirely (ref)
tsibley commented 4 weeks ago

I strongly prefer not hardcoding nextstrain.org-specific bits into Auspice. This rules out both possible solutions (1) and (2) above IMO. We've been moving away from this practice of special-casing over time, and I don't think we should slide backwards.

I suggest possible solution (3): ensuring all of our own datasets have a description field in their metadata so this codepath is used

https://github.com/nextstrain/auspice/blob/ca3dc6f5ee164f14cea20b7dceff268800a19caf/src/components/framework/footer.js#L158-L169

and then doing away with the hardcoding for nextstrain.org in Auspice.

Many (most?) of our datasets already do have a description, often with acknowledgements and links to download data. Those are displayed in the footer, but they're not displayed in the "Download data" modal because getAcknowledgements() is called without the dataset metadata:

https://github.com/nextstrain/auspice/blob/dddc6c9b2c3f95ec3f9048d4681403e20bd7191b/src/components/download/downloadModal.js#L96

Given the expected contents of description is acknowledgements (and maybe data download info), it seems to me we'd want to include that specific info in the "Download data" modal. That might be somewhat of a breaking change to the expected UI though?

joverlee521 commented 4 weeks ago

Related to https://github.com/nextstrain/auspice/issues/1800

victorlin commented 3 weeks ago

Originally I said:

This text [...] is added to all datasets on https://nextstrain.org/ as a result of these lines:

At dev chat I was corrected – it's not added to all datasets. It's only added if the dataset doesn't provide its own description. The proper solution is what @tsibley proposed in https://github.com/nextstrain/auspice/issues/1809#issuecomment-2291899869, and this would also apply to #1800. I've updated the issue description to reflect this.

genehack commented 3 weeks ago

Given the expected contents of description is acknowledgements (and maybe data download info), it seems to me we'd want to include that specific info in the "Download data" modal. That might be somewhat of a breaking change to the expected UI though?

I added this to the dev chat agenda for next time; if we can assemble consensus here before that, we can skip it, but hopefully it's a backstop to getting a decision made. (Adding it seems like the way to go, to me?)

victorlin commented 3 weeks ago

That might be somewhat of a breaking change to the expected UI though?

We could display it in both places? Similar reasoning to https://github.com/nextstrain/auspice/pull/1715#discussion_r1367308747

victorlin commented 3 weeks ago

Issue description updated with a list of all core datasets. I've checked off the ones that have a description already – there are only 4 that need updating. I'll make issues in the individual pathogen repos to figure out what to include in the custom description.

tsibley commented 3 weeks ago

That might be somewhat of a breaking change to the expected UI though?

We could display it in both places? Similar reasoning to #1715 (comment)

Ah, I didn't mean "move it", I meant "also include it". So, yeah, both places. By breaking UI change, I meant that the dataset-provided description can currently get quite long and may 1) break the UI of the download modal and/or 2) be written with the expectation of only being in the footer and not make sense in the context of the download modal (e.g. relative references like "use filters above").

victorlin commented 6 days ago

Issues have been created. A few options going forwards:

  1. Wait for the custom config to be added to the live dataset.
    • This would require not only adding the description and hooking it up to the workflow, but a re-run of the workflow itself to generate a new dataset file that includes the description.
  2. Manually inject the current description into the current dataset file.

    • This would require editing the JSON file on S3 and manually adding

      "meta": {
          "description": "This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge their contributions."
      }
  3. Continue to remove hardcoding without ensuring live builds have been updated.

(1) seems like the proper approach. However, some of the datasets that need updating aren't actively maintained. This means progress on removing hardcoding is likely to be blocked for a long time. Maybe this is fine though as it's not a pressing issue.