loculus-project / loculus

An open-source software package to power microbial genomic databases
https://loculus.org
GNU Affero General Public License v3.0
34 stars 2 forks source link

Allow groups to add their papers in which they first published/describe their sequences #2948

Open corneliusroemer opened 1 week ago

corneliusroemer commented 1 week ago

One thing we're currently missing in our data model is the ability for a submitter to add publications in which they describe their sequences. This is understandable: it's not something that's commonly done for rapidly shared sequences, there the publication happens after sequences are shared.

However, there often is very much information in the papers that describe how sequences were generated.

One way we could do this is to add an array "publications" to the groups, in which groups can list their papers. Ideally, there would be one set of publications per organism, to scope it correctly. We can then add these publications to the projects that we submit to ENA - this is a very common pattern and something we should eventually do to take full advantage of the available cross-linking.

It's also something we can already import from Genbank: many sequences/projects have papers listed already, so we could surface them in sequence view.

theosanderson commented 6 days ago

Ultimately though at ENA publications are associated at the level of sequences right? So we would need some per-sequence annotation. To me therefore it might be better if the sequence entry is just annotated with an array of DOIs for publications describing the sequence. While we might use a table to cache the titles and authors of DOIs, this wouldn't be a core part of our database - it would be fine to clear it and repopulate it from CrossRef.

chaoran-chen commented 6 days ago

Good idea! Per-sequence also seems more flexible to me. Should it be a DOI or can it just be any URL? We can still parse and do special things if it's a DOI and encourage people to put in a DOI but I can also imagine cases where something is e.g. published in a report or a old journal paper or book that does not have a DOI.