metadatacenter / cedar-submission-server

CEDAR server to handle submissions to metadata repositories
Other
0 stars 1 forks source link

Add AIRR SRA FTP upload functionality #1

Closed martinjoconnor closed 7 years ago

martinjoconnor commented 7 years ago

Add REST endpoint to accept CEDAR AIRR instance plus a raw data file and submit it to NCBI's FTP server.

Associated front end task is metadatacenter/cedar-template-editor#558

martinjoconnor commented 7 years ago

Some notes:

Here is the FTP information: Username: CEDAR Password: In CEDAR Stash Address: ftp-private.ncbi.nlm.nih.gov

Submission Procedure You will have a Test area and a Production area. We recommend to submit only into the Test area until you have created a several successful submissions. In order to create a submission you will first need to create a directory with the Submission name in either the Test or Production areas. You will need to deposit the XML file and the related data files into that directory. Once the submission.xml file and the data files are uploaded into the directory you will need to create an empty file with the name “submit.ready”. The empty “submit.ready” file is the trigger to let the pipeline know that the submission is ready for processing. When the pipeline is processing the submission it will create “report.[version].xml” files and a symlink (report.xml) to the latest report file. The report file will contain status updates, error messages and accessions.

Creating the Submission.xml file The submission.xml file can link to either link to existing BioProject/BioSamples or register new BioProject/BioSamples. The XML file should only create a single BioProject, but can have any number of BioSample and SRA components. The XML file is broken up into “Action” blocks and each “Action” is an instruction to the database which is referenced inside the “Action” block. Here is a simple Action block for SRA (http://www.ncbi.nlm.nih.gov/viewvc/v1/trunk/submit/public-docs/sra/samples/sra.submission.run.xml?view=co&revision=71838&content-type=text%2Fplain ). This Action will generate an SRA Experiment and Run. The Run will contain the files that are referenced in the “File” tag(s). The SRA metadata will be linked to existing BioProject and BioSample through their accessions. Here you can find all the example XML files: http://www.ncbi.nlm.nih.gov/viewvc/v1/trunk/submit/public-docs/sra/samples/

To verify that you formatted the submission.xml file correctly you can download the Submission XSD from here. The BioProject XSD can be downloaded from here. It is best used to determine the necessary attributes for the BioProject that you wish to register. The required BioSample attributes can be found by looking here and then selecting the sample model that best fits your sample. The required attributes must be in the Action block that related to the BioSample database. You can add additional attributes that you wish with any name. You can download the BioSample XSD from here. Aspera Upload Instructions: In order to have the submission processed you would need to use following command line: /opt/aspera/bin/ascp ¬-i ~/.ssh/ -QT -l100m ¬-k1 -d /file.fastq asp-center_abbr>@upload.ncbi.nlm.nih.gov:submit/Test/<submission_dir_name If all the files are in a single directory you can upload the entire directory like so: /opt/aspera/bin/ascp ¬-i ~/.ssh/ -QT -l100m ¬-k1 –d / asp-center_abbr>@upload.ncbi.nlm.nih.gov:submit/Test/<submission_dir_name For Production uploads the command like changes to: /opt/aspera/bin/ascp ¬-i ~/.ssh/ -QT -l100m ¬-k1 -d /file.fastq asp-center_abbr>@upload.ncbi.nlm.nih.gov:submit/Production/<submission_dir_name

martinjoconnor commented 7 years ago

Submission instructions also here:

https://docs.google.com/document/d/1tmPinCgaTwBkTsOwjitquFc0ZUN65w5xZs30q5phRkY/edit