Olivar multiplex PCR tiling design

Description

Olivar is a Python software for multiplex PCR tiling design. Olivar first builds an index for each target of interest, incorporating undesired sequence features such as homologous regions, SNPs and extreme GC content. Olivar then designs tiled amplicons covering a single index or multiple indexes, and minimizes primer dimers with the SADDLE algorithm. Olivar is published as an article on Nature Communications.

Web Interface

A web interface is available at olivar.rice.edu, although it does not support all available functions at the moment.

Install with Bioconda (Linux x64 or Mac Intel chip)

1. Install Miniconda if not installed already (quick command line install)

2. Create a new Conda environment named "olivar" and install Olivar via Bioconda

conda create -n olivar olivar --channel conda-forge --channel bioconda --channel defaults --strict-channel-priority

[!TIP] Setting channel priority is important for Bioconda packages to function properly. You may also persist channel priority settings for all package installation by modifying your ~/.condarc file. For more information, check the Bioconda documentation.

3. Activate the new Conda environment and run Olivar

conda activate olivar
olivar --help

Dependencies

python >=3.8
blast >=2.12.0
biopython
numpy <2
pandas
plotly >=5.13.0
tqdm

Reproducibility

To reproduce the results in example_output (primers used in the publication), specify package versions during installation and run example.py.

conda create -n olivar olivar=1.1.5 blast=2.13.0 numpy=1 --channel conda-forge --channel bioconda --strict-channel-priority

[!CAUTION] Git LFS is needed to clone the example BLAST database. Without Git LFS, blastn won't run on the incomplete example BLAST database and Olivar will raise IndexError: list index out of range.

Usage

Input files

(Required) Reference sequence for each target for tiling, in FASTA format (example). Ambiguous bases are not supported and may raise errors.
(Optional) A list of sequence variations to be avoided for each reference, in csv format (example). Column "START" and "STOP" are required, "FREQ" is considered as 1.0 if empty. Other columns are not required. Coordinates are 1-based.
(Optional) A BLAST database of non-specific sequences. More details can be found in Prepare a BLAST database.

[!NOTE] Coordinates are always 1-based, closed intervals, except fot the output .primer.bed file, which is in BED format.

Command-line interface

Olivar CLI comprises of four sub-commands: build, tiling, save and validate. Descriptions of command-line arguments can be found in Command-line parameters.

[!TIP] build, tiling, and validate support multiprocessing with -p option.

1. Build Olivar reference

A reference sequence in FASTA format is required, coordinates of sequence variations and BLAST database are optional. Only the first FASTA record is considered.

olivar build example_input/EPI_ISL_402124.fasta -v example_input/delta_omicron_loc.csv -d example_input/Human/GRCh38_primary -o example_output -p 1

An Olivar reference file (EPI_ISL_402124.olvr) will be generated, named by the ID of the FASTA record by default. Use multiple CPU cores (-p) to accelerate this process.

If you have multiple targets, run olivar build on each FASTA file and place all output .olvr files in the same directory.

In this step, the input reference sequence is chopped into kmers, and GC content, sequence complexity and BLAST hits are calculated for each kmer. Sequence variations are also labeled if coordinates are provided. A risk score is assigned to each nucleotide of the reference sequence, guiding the placement of primer design regions.

2. Design tiled amplicons

Input a single Olivar reference file generated in step 1, or a directory of multiple .olvr files (needs version ≥ 1.2). Set random seed (--seed) to make the results reproducible. Use multiple CPU cores (-p) to accelerate this process. Output files are listed below (coordinates are 1-based).

olivar tiling example_output/EPI_ISL_402124.olvr -o example_output --max-amp-len 420 --min-amp-len 252 --check-var --seed 10 -p 1

Default name	Description
olivar-design.olvd	Olivar design file, keeping all intermediate results during the design.
olivar-design.csv	Sequences, coordinates (1-based) and pool assignment of primers, inserts and amplicons.
olivar-design.primer.bed	Primer sequences and coordinates in ARTIC/PrimalScheme (BED) format.
olivar-design_SADDLE_Loss.html	Learning curve for primer dimer optimization.
olivar-design.json	Design configurations.
EPI_ISL_402124.fasta	Reference sequence.
EPI_ISL_402124.html	An interactive plot to view primers and the risk array.
EPI_ISL_402124_PDR_Loss.html	Learning curve for PDR optimization.
EPI_ISL_402124_risk.csv	Risk scores of each risk component.

"olivar-design" is the name of the whole design (might contain multiple targets), and "EPI_ISL_402124" is the name of a single target, determined by the ID of the reference FASTA record by default (see step 1).

In this step, the placement of primer design regions (PDRs) is optimized based on the risk array (Fig.1d), and primer candidates are generated by SADDLE for each PDR in the optimized PDR set. SADDLE also minimizes primer dimer by exploring different combinations of primer candidates.

(Optional) Load from a previous Olivar design and save output files

Output files in step 2 can be generated repeatedly as long as the Olivar deisng file (.olvd) is provided.

olivar save example_output/olivar-design.olvd -o example_output

[!WARNING] .olvr and .olvd files are generated with pickle. Do NOT load those files from untrusted sources.

(Optional) Validate existing primer pools

Input should be a csv file, with four required columns: "amplicon_id" (amplicon name), "fP" (sequence of forward primer), "rP" (sequence of reverse primer) and "pool" (primer pool number, e.g., 1). This could be an Olivar designed primer pool generated in step 2, or primer pools that are not designed by Olivar. Output files are listed below (coordinates are 1-based). Use multiple CPU cores (-p) to accelerate this process.

olivar validate example_output/olivar-design.csv --pool 1 -d example_input/Human/GRCh38_primary -o example_output -p 1

Default name	Description
olivar-val_pool-1.csv	Basic information of each single primer, including dG, dimer score, BLAST hits, etc.
olivar-val_pool-1_ns-amp.csv	Predicted non-specific amplicons.
olivar-val_pool-1_ns-pair.csv	Predicted non-specific primer pairs.

Import Olivar as a Python package

Olivar can also be imported as a Python package, comprising of four functions with the same names and parameters as the four sub-commands in the CLI.

from olivar import build, tiling, save, validate

Refer to example.py for more details.

Command-line parameters

sub-command: `build`

olivar build fasta-file [--var <string>] [--db <string>] [--output <string>] 
[--title <string>] [--threads <int>]

Argument	Default	Description
fasta-file		Positional argument. Path to the FASTA reference sequence.
--var, -v	None	Optional, path to the csv file of SNP coordinates and frequencies. Required columns: "START", "STOP", "FREQ". "FREQ" is considered as 1.0 if empty. Coordinates are 1-based.
--db, -d	None	Optional, path to the BLAST database. Note that this path should end with the name of the BLAST database (e.g., "example_input/Human/GRCh38_primary").
--output, -o	./	Output directory (output to current directory by default).
--title, -t	FASTA record ID	Name of the Olivar reference file.
--threads, -p	1	Number of threads.

sub-command: `tiling`

olivar tiling olvr-path [--output <string>] [--title <string>] [--max-amp-len <int>] 
[--min-amp-len <int>] [--w-egc <float>] [--w-lc <float>] [--w-ns <float>] [--w-var <float>] 
[--temperature <float>] [--salinity <float>] [--dg-max <float>] [--min-gc <float>] 
[--max-gc <float>] [--min-complexity <float>] [--max-len <int>] [--check-var] 
[--fp-prefix <DNA>] [--rp-prefix <DNA>] [--seed <int>] [--threads <int>]

Argument	Default	Description
olvr-path		Positional argument. Path to the Olivar reference file (.olvr), or the directory of reference files for multiple targets
--output, -o	./	Output path (output to current directory by default).
--title, -t	olivar-design	Name of design.
--max-amp-len	420	Maximum amplicon length.
--min-amp-len	None	Minimum amplicon length. 0.9*{max-amp-len} if not provided.
--w-egc	1.0	Weight for extreme GC content.
--w-lc	1.0	Weight for low sequence complexity.
--w-ns	1.0	Weight for non-specificity.
--w-var	1.0	Weight for variations.
--temperature	60.0	PCR annealing temperature.
--salinity	0.18	Concentration of monovalent ions in units of molar.
--dg-max	-11.8	Maximum free energy change of a primer in kcal/mol.
--min-gc	0.2	Minimum GC content of a primer.
--max-gc	0.75	Maximum GC content of a primer.
--min-complexity	0.4	Minimum sequence complexity of a primer.
--max-len	36	Maximum length of a primer.
--check-var	False	Boolean flag. Filter out primer candidates with variations within 5nt of 3' end. NOT recommended when a lot of variations are provided, since this would significantly reduce the number of primer candidates.
--fp-prefix	None	Prefix of forward primer. Empty by default.
--rp-prefix	None	Prefix of reverse primer. Empty by default.
--seed	10	Random seed for optimizing PDRs and SADDLE.
--threads, -p	1	Number of threads.

sub-command: `save`

olivar save olvd-file [--output <string>]

Argument	Default	Description
olvd-file		Positional argument. Path to the Olivar design file (.olvd)
--output, -o	./	Output directory (output to current directory by default).

sub-command: `validate`

olivar validate csv-file [--pool <int>] [--db <string>] [--output <string>] 
[--title <string>] [--max-amp-len <int>] [--temperature <float>] [--threads <int>]

Argument	Default	Description
csv-file		Positional argument. Path to the csv file of a primer pool. Required columns: "amplicon_id" (amplicon name), "fP" (sequence of forward primer), "rP" (sequence of reverse primer), "pool" (pool number, e.g., 1).
--pool	1	Primer pool number.
--db, -d	None	Optional, path to the BLAST database. Note that this path should end with the name of the BLAST database (e.g., "example_input/Human/GRCh38_primary").
--output, -o	./	Output directory (output to current directory by default).
--title, -t	olivar-val	Name of validation.
--max-amp-len	1500	Maximum length of predicted non-specific amplicon. Ignored is no BLAST database is provided.
--temperature	60.0	PCR annealing temperature.
--threads, -p	1	Number of threads.

Prepare a BLAST database

[!TIP] All BLAST related commands/scripts are installed along with Olivar.
To make your own BLAST database with the makeblastdb command, check out the NCBI BLAST User Manual. \ The example BLAST database is created with 23 Chromosomes and MT of human genome assembly GRCh38, with the command (BLAST version 2.12.0):
makeblastdb -in GRCh38_primary.fasta -dbtype nucl -title GRCh38_primary -parse_seqids -hash_index -out GRCh38_primary -max_file_sz 4GB -logfile makeblastdb.out -taxid 9606
To download a pre-built BLAST database from NCBI (e.g., RefSeq representative gennomes for viruses), use the update_blastdb.pl script:
update_blastdb.pl --decompress ref_viruses_rep_genomes
For more details about update_blastdb.pl, check the BLAST Help. \ For more pre-built databases, check the NCBI FTP site.

treangenlab / Olivar

readme

Olivar multiplex PCR tiling design

Description

Web Interface

Install with Bioconda (Linux x64 or Mac Intel chip)

1. Install Miniconda if not installed already (quick command line install)

2. Create a new Conda environment named "olivar" and install Olivar via Bioconda

3. Activate the new Conda environment and run Olivar

Dependencies

Reproducibility

Usage

Input files

Command-line interface

1. Build Olivar reference

2. Design tiled amplicons

(Optional) Load from a previous Olivar design and save output files

(Optional) Validate existing primer pools

Import Olivar as a Python package

Command-line parameters

sub-command: `build`

sub-command: `tiling`

sub-command: `save`

sub-command: `validate`

Prepare a BLAST database

treangenlab / Olivar

readme

Olivar multiplex PCR tiling design

Description

Web Interface

Install with Bioconda (Linux x64 or Mac Intel chip)

1. Install Miniconda if not installed already (quick command line install)

2. Create a new Conda environment named "olivar" and install Olivar via Bioconda

3. Activate the new Conda environment and run Olivar

Dependencies

Reproducibility

Usage

Input files

Command-line interface

1. Build Olivar reference

2. Design tiled amplicons

(Optional) Load from a previous Olivar design and save output files

(Optional) Validate existing primer pools

Import Olivar as a Python package

Command-line parameters

sub-command: build

sub-command: tiling

sub-command: save

sub-command: validate

Prepare a BLAST database

sub-command: `build`

sub-command: `tiling`

sub-command: `save`

sub-command: `validate`