hoelzer / mgnify-lr

Evaluation of long-read support for the MGnify pipeline
GNU General Public License v3.0
6 stars 5 forks source link

output ENA project.xml file #9

Closed hoelzer closed 4 years ago

hoelzer commented 4 years ago

the STUDY (alias) should be the same for every assembly being uploaded to the same project. You will also need to generate a project.xml file which looks something like this:

<PROJECT_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <PROJECT alias="SRP189971_1586f07841cbf8f05e203684ee3b981b">
        <TITLE>EMG produced TPA metagenomics assembly of PRJNA530103 data set (Viral metagenome groundwater aquifer).</TITLE>
        <DESCRIPTION>TheThird Party Annotation (TPA)  assembly was derived from the primary whole genome shotgun (WGS) data set SRP189971, and was assembled with metSPAdes(v3.13.0).</DESCRIPTION>
        <SUBMISSION_PROJECT>
            <SEQUENCING_PROJECT/>
         </SUBMISSION_PROJECT>
         <PROJECT_LINKS>
            <PROJECT_LINK>
                <XREF_LINK>
                    <DB>PUBMED</DB>
                    <ID>29069476</ID>
                </XREF_LINK>
            </PROJECT_LINK>
        </PROJECT_LINKS>
        <PROJECT_ATTRIBUTES>
            <PROJECT_ATTRIBUTE>
                <TAG>new_study_type</TAG>
                <VALUE>Metagenomic assembly</VALUE>
            </PROJECT_ATTRIBUTE>
        </PROJECT_ATTRIBUTES>
    </PROJECT>
</PROJECT_SET>

Again the project alias needs to match the study alias in all manifest files. pubmedID, title and description can be as you wish

hoelzer commented 4 years ago

FYI we have a (somewhat messy atm) script that generates these files for us. It is here if you want to look at how we generate aliases (we use combined md5s), but the alias can be anything you want - it is just temporary. /nfs/production/metagenomics/production/production-scripts/TPA-uploads/manifest.py