This repository provides a sequence uploader for the COVID-19 Virtual Biohackathon's Public Sequence Resource project. There are two versions, one that runs on the command line and another that acts as web interface. You can use it to upload the genomes of SARS-CoV-2 samples to make them publicly and freely available to other researchers. For more information see the paper.
To get started, first install the uploader, and use the bh20-seq-uploader
command to upload your data.
There are several ways to install the uploader. The most portable is with a virtualenv.
virtualenv
pycurl
and pyopenssl
. On Ubuntu 18.04, you can run:sudo apt update
sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev
virtualenv --python python3 venv
. venv/bin/activate
Note that you will need to repeat the . venv/bin/activate
step from this directory to enter your virtualenv whenever you want to use the installed tool.
Install from PyPi:
pip3 install bh20-seq-uploader
Install from git:
pip3 install git+https://github.com/arvados/bh20-seq-resource.git@master
bh20-seq-uploader --help
It should print some instructions about how to use the uploader.
Make sure you are in your virtualenv whenever you run the tool! If you ever can't run the tool, and your prompt doesn't say (venv)
, try going to the directory where you put the virtualenv and running . venv/bin/activate
. It only works for the current terminal window; you will need to run it again if you open a new terminal.
pip3 --user
If you don't want to have to enter a virtualenv every time you use the uploader, you can use the --user
feature of pip3
to install the tool for your user.
virtualenv
method, you need to install some dependencies. On Ubuntu 18.04, you can run:sudo apt update
sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev
pip3 install --user git+https://github.com/arvados/bh20-seq-resource.git@master
PATH
. The pip3
command will install the uploader in .local/bin
inside your home directory. Your shell may not know to look for commands there by default. To fix this for the terminal you currently have open, run:export PATH=$PATH:$HOME/.local/bin
To make this change permanent, assuming your shell is Bash, run:
echo 'export PATH=$PATH:$HOME/.local/bin' >>~/.bashrc
bh20-seq-uploader --help
It should print some instructions about how to use the uploader.
If you plan to contribute to the project, you may want to install an editable copy from source. With this method, changes to the source code are automatically reflected in the installed copy of the tool.
sudo apt update
sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev
git clone https://github.com/arvados/bh20-seq-resource.git
cd bh20-seq-resource
virtualenv --python python3 venv
. venv/bin/activate
Note that you will need to repeat the . venv/bin/activate
step from this directory to enter your virtualenv whenever you want to use the installed tool.
pip3 install -e .
bh20-seq-uploader --help
It should print some instructions about how to use the uploader.
For running/developing the uploader with GNU Guix see INSTALL.md
Run the uploader with a FASTA or FASTQ file and accompanying metadata file in JSON or YAML:
bh20-seq-uploader example/metadata.yaml example/sequence.fasta
If the sample_id of your upload matches a sample already in PubSeq, it will be considered a new version and supercede the existing entry.
All these uploaded sequences are being fed into a workflow to generate a pangenome for the virus. You can replicate this workflow yourself.
An example is to get your SARS-CoV-2 sequences from GenBank in seqs.fa
, and then run a series of commands
minimap2 -cx asm20 -X seqs.fa seqs.fa >seqs.paf
seqwish -s seqs.fa -p seqs.paf -g seqs.gfa
odgi build -g seqs.gfa -s -o seqs.odgi
odgi viz -i seqs.odgi -o seqs.png -x 4000 -y 500 -R -P 5
Here we convert such a pipeline into the Common Workflow Language (CWL) and sources can be found here.
For more information on building pangenome models, see this wiki page.
This project comes with a simple web server that lets you use the sequence uploader from a browser. It will work as long as you install the packager with the web
extra.
To run it locally:
virtualenv --python python3 venv
. venv/bin/activate
pip install -e ".[web]"
env FLASK_APP=bh20simplewebuploader/main.py flask run
Then visit http://127.0.0.1:5000/.
For production deployment, you can use gunicorn:
pip3 install gunicorn
gunicorn bh20simplewebuploader.main:app
This runs on http://127.0.0.1:8000/ by default, but can be adjusted with various gunicorn options.