loculus-project / loculus

An open-source software package to power microbial genomic databases
https://loculus.org
GNU Affero General Public License v3.0
37 stars 2 forks source link

Improve project deposition mapping/structure #2977

Closed corneliusroemer closed 4 weeks ago

corneliusroemer commented 1 month ago

Some ideas for improving the project submissions are in this thread: https://loculus.slack.com/archives/C0757PTR607/p1728464162216809

Currently we show like this: image

We could try to get something more like this:

image

Showing like this in NCBI: image Corresponding ENA project: https://www.ebi.ac.uk/ena/browser/view/PRJEB80643

The first sample we submitted was incorrectly labelled as genomic DNA - this has been fixed by submitting an EMBL file.

anna-parker commented 1 month ago

This is a standard project xml from us:

<?xml version="1.0" encoding="utf-8"?>
<PROJECT_SET>
    <PROJECT center_name="{center_name}" alias="1:cchf:Loculus">
        <NAME>Orthonairovirus haemorrhagiae</NAME>
        <TITLE>Orthonairovirus haemorrhagiae: Genome sequencing</TITLE>
        <DESCRIPTION>Automated upload of Orthonairovirus haemorrhagiae sequences submitted by {center_name} from Loculus</DESCRIPTION>
        <SUBMISSION_PROJECT>
            <SEQUENCING_PROJECT>
            </SEQUENCING_PROJECT>
            <ORGANISM>
                <TAXON_ID>3052518</TAXON_ID>
                <SCIENTIFIC_NAME>Orthonairovirus haemorrhagiae</SCIENTIFIC_NAME>
            </ORGANISM>
        </SUBMISSION_PROJECT>
        <PROJECT_LINKS>
            <PROJECT_LINK>
                <XREF_LINK>
                    <DB>Loculus</DB>
                    <ID>1</ID>
                </XREF_LINK>
            </PROJECT_LINK>
        </PROJECT_LINKS>
    </PROJECT>
</PROJECT_SET>
corneliusroemer commented 1 month ago

I wonder what the best info to include is. We could mention the country and possibly city/state of the submitter - as that is often a good proxy to where sequences are from - even if not 100% accurate.

So we could do:

Name: West Nile Virus Genome Sequencing
Title: West Nile Virus genome submissions by Institution Name, City, Country to Pathoplexus
Description: Automated upload of West Nile Virus sequences originally submitted by Institution Name, City, Country to Pathoplexus

Then we have a little bit more info on each level. Would be nice if we had a nice group display name, something like "Grubaugh Lab" in addition to the institution which is not as concrete as it could be. For now, given we don't have this, no point in adding, but we could do this in the future.