nextstrain / pathogen-repo-guide

4 stars 1 forks source link

Add template for phylogenetic `description.md` #48

Open joverlee521 opened 3 months ago

joverlee521 commented 3 months ago

Context

Originally brought up by @trvrb and @genehack on Slack.

Description

Since the pathogen ingest workflow has been pretty standardized around using public NCBI data, it makes sense to include a generic description.md to be used by phylogenetic workflows to acknowledge the source of the underlying data.

Examples

Possible solution

Recommended text:

We gratefully acknowledge the authors, originating and submitting laboratories of the genetic sequences and metadata for sharing their work. Please note that although data generators have generously shared data in an open fashion, that does not mean there should be free license to publish on this data. Data generators should be cited where possible and collaborations should be sought in some circumstances. Please try to avoid scooping someone else's work. Reach out if uncertain.

We curate sequence data and metadata from NCBI as starting point for our analyses. Curated sequences and metadata are available as flat files at: data.nextstrain.org/files/workflows/{pathogen}/sequences.fasta.zst data.nextstrain.org/files/workflows/{pathogen}/metadata.tsv.zst

joverlee521 commented 3 months ago

As noted in https://github.com/nextstrain/seasonal-cov/pull/24#discussion_r1659407440, the listed data section should only be added if the files exist on S3 and the filepaths can differ based on whether the pathogen includes subtypes/segments that are in separate files.