pb-jlandolin / PacbioToSRA

Take a list of Pacbio files (.fofn) and creates a spreadsheet for data submission to the sequuence read archive (SRA)
Other
13 stars 4 forks source link

PacbioToSRA

This repo contains scripts, instructions, and examples on preparing PacBio sequence data for data submission to the SRA.

Instructions

  1. Register project and samples
  2. Setup script's environment
  3. Run the script
  4. Update spreadsheet and email it to NCBI

Step 1. Register project and samples

Go to https://submit.ncbi.nlm.nih.gov/ and register your Bioproject
Go to https://submit.ncbi.nlm.nih.gov/ and register your Biosample

Step 2. Prepare script's environment

Setup virtual environment:

(go to the root directory of this repo)
$ virtualenv virtualenv_PacbioToSRA
$ source virtualenv_PacbioToSRA/bin/activate
$ pip install -r requirements.txt

Step 3. Run the script

Usage:

$ bin/pacb_ncbi --help
Usage: pacb_ncbi [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  calc_upload_size              Calculates the total size of the data that...
  create_excel_file             Creates the Excel file that contains the...
  create_excel_file_and_upload  Creates the Excel file that contains the...
  upload                        Uploads the datasets in the input.fofn file...

Example:

$ bin/pacb_ncbi create_excel_file_and_upload -i /path/to/input.fofn -p bioproject1 -s biosample1 -x my_sra.xlsx -u ncbi_username -k /path/to/ssh/file

Notes:

Step 4. Update spreadsheet and email it to NCBI