metadatacenter / cedar-submission-server

CEDAR server to handle submissions to metadata repositories
Other
0 stars 1 forks source link

AIRR NCBI FTP submission #5

Closed martinjoconnor closed 6 years ago

martinjoconnor commented 7 years ago

Add NCBI user email to XML for submitting AIRR instance to SRA.

Previously were going provide ability to supply user name and password. No longer needed because we now use CEDAR account submit to NCBI FTP server and embed user email in submission XML.

Current FTP submission code here:

https://github.com/metadatacenter/cedar-submission-server/blob/develop/cedar-submission-server-application/src/main/java/org/metadatacenter/cedar/submission/resources/AIRRSubmissionServerResource.java

The AIRR template is here:

https://cedar.staging.metadatacenter.net/templates/edit/https://repo.staging.metadatacenter.net/templates/d7c9d050-a4aa-4448-b7ed-cf58b980baf2?folderId=https:%2F%2Frepo.staging.metadatacenter.net%2Ffolders%2Ff77e5f5e-0bce-4e52-8415-a684614b9461

IMPORTANT: the JSON-to-XML translation code in the submission server expects the template to be exactly as specified here. Any changes to the template must be accompanied by changes to the translation code.

martinjoconnor commented 7 years ago

Related email:

Hello,

I apologize for the delay. We have set the center account CEDAR as a broker account. In order for the PI to take ownership of their submission you need to get their email address, preferably with their first and last names. In the submission.xml file you generate there is a “Description” block in the beginning where you enter the contact email. You would enter the owners email into that part of the XML. Also, I took a look at your test folder.

In order to create a submission you need to create a folder for that submission in the submit/Test directory. Then the XML spuid_namespace should be set to “cedar” and the action block that relates to SRA must have files attribute. Also, both the SRA and BioSample are missing their Identifier blocks.

I have attached a simple example of a submission.xml file that would create a BioProject, BioSample and SRA record that ties all of them together. Please use that as a guide to make sure all the Identifier blocks are included in the correct areas.

If you will be testing submissions please notify me so that I can help troubleshoot configuration and XML errors.

Best, Yuriy From: John Graybeal [mailto:jgraybeal@stanford.edu] Sent: Wednesday, May 17, 2017 6:59 PM To: Skripchenko, Yuriy (NIH/NLM/NCBI) [C] yuriy.skripchenko@nih.gov Cc: Busby, Ben (NIH/NLM/NCBI) [E] busbybr@ncbi.nlm.nih.gov; John Graybeal jgraybeal@stanford.edu; Martin O'Connor sunid@stanford.edu; Kei-Hoi Cheung kei.cheung@yale.edu; Bukhari, Syed Ahmad Chan ahmad.chan@yale.edu Subject: Fwd: NCBI FTP

Hi Yuriy,

We've all been discussing your input, thanks much!

However, we can set up your center account so that when individuals submit their data through your system they will receive an email with a link to take ownership of the submission. This way the individual will own the submission and not your center. I believe this is exactly what you need.

This approach sounds perfect.

Is this just a switch you throw on your end to make the CEDAR NCBI FTP account behave in this way? If so, we would like you to do this.

Or does it require some custom work on your end, with associated development work? If so, could you give us a ballpark timeframe on when this feature would be available?

Oh, and we have a question about the source of the "user email". We assume you are getting this email from a particular field in the submission. How will we agree on which field to use?

Note that for a prototype, having the user email come from an existing field in the metadata is probably workable. But going forward, how will we handle a submitter/contact person that is different than the project owner/principal investigator? We know that is something we will have to support in the real world.

Thanks much!

Martin and John, on behalf of Kei and Ahmad

martinjoconnor commented 7 years ago

Submission instructions from NCBI (including Aspera-based submissions)

Simplified XML Submission.docx

johardi commented 7 years ago

An idea if to use Aspera: http://asperasoft.com/software/transfer-clients/connect-web-browser-plug-in/. Need to investigate if this plugin can be embedded in the CEDAR editor.

martinjoconnor commented 7 years ago

EMail to Yuriy on June 28th:

Hello Yuriy,

I have some questions about the NCBI submission system based on my observation from the earlier experiments:

1. Which email address does the system use to send the notification? The system seems to ignore the email address in the submission.xml. My guess is that it uses the one from the BioProject registration. Can you confirm this?
2. The email notification mentions the submission link (see below) but the content is not what I submitted. Am I missing something?
   Your BioSample records will be accessible with the following link:
   http://www.ncbi.nlm.nih.gov/biosample/2119324
 ```
  1. The email notification and the report.xml seem don’t agree to each other (see attachment). The report said there was an error but the notification said the otherwise. Which one should we trust?
  2. Can the system handle compressed files (e.g., zip, tar.gz)?

Best, Josef

martinjoconnor commented 7 years ago

Response from Yuiry on June 29th:

Hi Josef,

1.       The email that is used for notifications is the email address assigned to the center and any member of the center account (as the test system does not allow outside logins it is only myself and service account we register). In your case the center email account is set to John’s email. Did it not go to John’s email?
2.       With testing it is confusing as the link is intended for production. As your sample is a test sample you won’t be able to see it on an Entrez webpage as we do not have an outfacing test Entrez system. The test system uses the a new counter for samples so the link would take you to a sample that would have been already registered by someone else or it would be a dead link if the sample was removed.
3.       So each action is sent to a separate backend database. You had an action for BioSample and SRA. The BioSample succeed as the email notification indicated, which allowed the SRA action to be processed. This failed due to recent changes in our pipeline and we will have it sorted out. We process actions in a submission based on a hierarchy which is BioProject then BioSample then SRA. If the above 2 levels do not process then the SRA submission will also not process as we do not want to store files without sufficient metadata to make them useful to the community. The report.xml file is what should be checked as it will contain all responses from all backend databases you submit to.
4.       Yes, we can handle gz, bz2, tar, and tar.gz. If you will be uploading tars you still need to provide the individual file names in the XML. So if you have paired-end files the XML will contain 2 file tags for each of the fastq files, but you can upload 1 tar file containing both. Also, please do not create complicated directory structures in the tars. If you want to tar 1 directory that contains the files that will work, but if it will be multiple directories or a nested directory tree that will cause problems.  

Since your center account is set to act as a broker for submitters, you should have received an email to take ownership of the submission and the email should have went out to you Josef. Did you receive this email?

Best,
Yuriy
martinjoconnor commented 7 years ago

Questions from Josef on June 29th:

Hi Yuriy,

Thank you for your reply! I understand the system better now, but I still have a bit of questions about the notification part. Answering your questions: (1) yes, the email notification went to John’s email and (2) no, I haven’t received any email about taking the ownership of the submission. Could you please describe the business process of being a broker in the system? If you could describe it in the sequence diagram, what it would be?

Another thing is could you confirm that the system would parse this Description part below from the submission.xml? I’ve been told that the email notification will be based on that information, is this true? If not, how is the notification system implemented?

<Description>
   <Comment>This is dummy AIRR submission to test the CEDAR workbench</Comment>
   <Submitter user_name="johardi"/>
   <Organization type="lab" role="owner">
      <Name>CEDAR</Name>
      <Contact email="johardi@stanford.edu">
         <Name>
            <First>Josef</First>
            <Last>Hardi</Last>
         </Name>
      </Contact>
   </Organization>
</Description>

Best,
Josef
martinjoconnor commented 7 years ago

Related to task #11.

martinjoconnor commented 6 years ago

Email from July 12th:


Dear Josef,

As it was explained to me by the developers when we implemented this feature the description block would be parsed for the “<Contact email="johardi@stanford.edu">” tag and an email would be sent to the specified address. Once the link is clicked in the email there is a database update that changes the ownership id from the center account to that of the user id. The change in ownership is then propagated to the backend databases.

I have contacted the developers to make sure that the above is still true and has not been changed and to double check that CEDAR is set as a brooker account.

Thank you,

Yuriy Skripchenko
SRA Curator