Plain simple amplicon sequence simulator for in-silico genomic sequencing assays
TL;DR: no external requirements needed. Both the recursive GitHub clone as well as the bioconda package should work out-of-the-box.
The easiest way to install amplisim is via the conda package manager from the bioconda channel. Please note that the conda installation is currently only available for Linux operating systems.
# create a new conda environment
conda create --name amplisim
# install the latest amplisim version from the bioconda channel
conda install -c bioconda amplisim
git clone --recursive https://github.com/Krannich479/amplisim.git
cd amplisim
mkdir build
make -C lib/htslib
make
A quick and simple way to test your software binary is to download and run amplisim on some public Sars-Cov-2 data.
mkdir testdata && cd testdata
wget https://raw.githubusercontent.com/artic-network/primer-schemes/master/nCoV-2019/V5.3.2/SARS-CoV-2.primer.bed
wget https://www.ebi.ac.uk/ena/browser/api/fasta/MN908947.3
sed 's/>ENA|MN908947|MN908947.3 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome./>MN908947.3/g' MN908947.3 > MN908947.3.fasta
cd ..
amplisim testdata/MN908947.3.fasta testdata/SARS-CoV-2.primer.bed
The most concise way to get familiar with amplisim is to inspect the help page via amplisim --help
. This will display
Usage: amplisim [OPTION...] REFERENCE PRIMERS
amplisim -- a program to simulate amplicon sequences from a reference genome
-m, --mean=INT Set the mean number of replicates per amplicon
-n, --sd=INT Set the standard deviation for the mean number of
replicates per amplicon
-o, --output=FILE Output to FILE instead of standard output
-s, --seed=INT Set a random seed
-x, --dropout=INT Set the likelihood for an amplicon dropout [0,1]
-?, --help Give this help list
--usage Give a short usage message
-V, --version Print program version
Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.
Report bugs to https://github.com/rki-mf1/amplisim/issues.
The minimal command to run amplisim is to provide a reference genome in FASTA format and a set of primers in BED format (see chapter Input and output for more details). By default, amplisim prints the amplicons sequences to the standard output such that the user can either direct the sequences to a file or forward them to the next program.
amplisim <my_reference.fasta> <my_primers.bed> > <my_amplicons.fasta>
If you want amplisim to store the resulting amplicon sequences directly in a FASTA file you can use the -o
option.
amplisim -o <my_amplicons.fasta> <my_reference.fasta> <my_primers.bed>
The PRIMERS
input file is a plain tab-separated textfile with pre-defined columns.
The format of the PRIMERS
file required by amplisim has to comply with the following properties:
These format properties generally comply with the definitions in samtools but are slightly more stringent as amplisim currently does not allow alternative primers in a pair. Directly fitting examples can be found in the artic-network repository for virus primer schemes, e.g. the primers for Sars-Cov-2.
The REFERENCE
input file is a standard textfile in FASTA format which contains one or multiple records (chromosomes).
The output of amplisim is a stream or plain textfile in the FASTA format.
The header line of each amplicon sequence provides the following information:
>amplicon_<amplicon_index>_<replicate_index>
where _
For questions about amplisim, feature requests and bug reports please refer to the issues section of this repository.