loculus-project / loculus

An open-source software package to power microbial genomic databases
https://loculus.org
GNU Affero General Public License v3.0
37 stars 2 forks source link

Ingest: Explore submitting updated metadata as biosample updates to ENA #3086

Closed corneliusroemer closed 2 weeks ago

corneliusroemer commented 4 weeks ago

We should test how well updating of metadata works via updated biosamples, without doing automation/pipeline development yet

anna-parker commented 3 weeks ago

I will attempt to revise the taxonId for https://pathoplexus.org/seq/PP_000RGTE.1 and https://pathoplexus.org/seq/PP_000RGSG.1.

It looks like I can do this quite easily using dry-run and https://ena-docs.readthedocs.io/en/latest/update/metadata/programmatic-sample.html

anna-parker commented 3 weeks ago

Updates are successful on dev, but I see no results on the page

curl -u "$username:$password" -F 'SUBMISSION=@sample/submission.xml' -F 'SAMPLE=@sample/sample.xml' https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/ > results/output.txt

Using as in the docs the full sample.xml and the modify submission.xml:

<SUBMISSION>
     <ACTIONS>
         <ACTION>
             <MODIFY/>
         </ACTION>
    </ACTIONS>
</SUBMISSION>
anna-parker commented 3 weeks ago

It could be just the dev site being down, I tried now to update the page using the browser and I get an error:

image
anna-parker commented 3 weeks ago

Ok I tried on the main site as well and programmatic submission again resulted in a successful curl request but no changes on the page - however I was able to just edit the page and here the update was successful

anna-parker commented 3 weeks ago

Update: I do find an error when I submit updates programmatically on dev:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="receipt.xsl"?>
<RECEIPT receiptDate="2024-11-05T09:54:05.497Z" submissionFile="submission.xml" success="false">
     <SAMPLE alias="PP_000RGSG:west-nile:Pathoplexus" status="PUBLIC"/>
     <SUBMISSION alias="SUBMISSION-05-11-2024-09:54:05:240"/>
     <MESSAGES>
          <ERROR>Failed to submit samples to BioSamples</ERROR>
          <ERROR>Failed to submit samples to BioSamples</ERROR>
          <ERROR>An exception occurred: java.lang.RuntimeException: Failed to submit all samples to BioSamples</ERROR>
          <INFO>This submission is a TEST submission and will be discarded within 24 hours</INFO>
     </MESSAGES>
     <ACTIONS>MODIFY</ACTIONS>
</RECEIPT>

However, I do get a success message on prod:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="receipt.xsl"?>
<RECEIPT receiptDate="2024-11-05T09:57:37.484Z" submissionFile="submission.xml" success="true">
     <SAMPLE accession="ERS21098997" alias="PP_000RGSG:west-nile:Pathoplexus" status="PUBLIC">
          <EXT_ID accession="SAMEA116100917" type="biosample"/>
     </SAMPLE>
     <SUBMISSION accession="" alias="SUBMISSION-05-11-2024-09:57:37:236"/>
     <MESSAGES/>
     <ACTIONS>MODIFY</ACTIONS>
</RECEIPT>
anna-parker commented 3 weeks ago

It worked on prod!!!! I updated the authors and taxon of https://www.ncbi.nlm.nih.gov/biosample/SAMEA116100917 and https://www.ncbi.nlm.nih.gov/biosample/SAMEA116120071 -> should now be under west nile in NCBI Virus

image
corneliusroemer commented 2 weeks ago

One is mostly updated (except for Organism and Scientific Name, at least on the biosample page): https://www.ebi.ac.uk/ena/browser/view/SAMEA116100917 The other one is not yet changed: https://www.ebi.ac.uk/ena/browser/view/SAMEA116120071

Curiously, this one shows ENA first public Year 1000:

image
anna-parker commented 2 weeks ago

Both have now updated the lat, long values on ENA however taxon has remained the same and changes are not visible on NIH.

anna-parker commented 2 weeks ago

Yay! All the updates have propagated to ENA and NIH!