NGS tool for detecting MEI and gene retrotransposition events in WGS and WES data
Mobster: accurate detection of mobile element insertions in next generation sequencing data.
Thung et al. Genome Biol. 2014
Mobster depends on an aligner for aligning potential reads to mobiome reference. By default MOSAIK is the aligner of choice, which can be installed via bioconda or cloning the MOSAIK git repository.
Mobster is written in Java (tested working with Java 8), and built using maven, hence assumes that they are in your PATH
.
git lfs
is required. See this link for more detail.
The sources can be cloned to any directory:
git clone git@github.com:jyhehir/mobster.git
Then, to build mobster simply run "install.sh".
cd mobster
./install.sh
This will package the required classes into a fat executable jar in the "target" directory. Typically called MobileInsertions-\<version>.jar. For example MobileInsertions-0.2.4.1.jar.
After installation, the aligner of choice (e.g. MOSAIK) is assumed to be in the PATH
. If not, please don't forget to update Mobster.properties to include the location for the aligner.
For GRCh38 use, the user should unpack the compressed repmask resources:
gunzip alu_l1_herv_sva_other_grch38_accession_ucsc.rpmsk.gz
Try running Mobster with the sample bam provided
cd target
java -Xmx8G -jar MobileInsertions-0.2.4.1.jar \
-properties Mobster.properties \
-out TestSample
Example on how to call the program for a single sample:
java -Xmx8G -jar MobileInsertions-1.0-SNAPSHOT.jar \
-properties Mobster_latest.properties \
-in input.bam \
-sn test_sample \
-out mobster_test
You can also run the program in multiple sample mode. For this you need to change MULTIPLE_SAMPLE_CALLING
in the properties file to true. Then you can run Mobster like:
java -Xmx8G -jar MobileInsertions-1.0-SNAPSHOT.jar \
-properties Mobster_latest.properties \
-in A1_child.bam,A1_father.bam,A1_mother.bam \
-sn A1_child,A1_father,A1_mother \
-out A1_trio_mobster
Important:
git lfs
. These will not be retrieved with the initial git clone, but will be downloaded when you run "install.sh".MULTIPLE_SAMPLE_CALLING
to "true" (the default is "false").READ_LENGTH
property to the appropriate value for your BAM file in the properties file.MAPPING_TOOL
is set to "unspecified" and MINIMUM_MAPQ_ANCHOR
is set to 20. I would leave this as is.MINIMUM_POLYA_LENGTH
to 7.Mobster needs a GRCh37/GRCh38 repmask library file.
The location of this file is included in the Mobster.properties
file:
REPEATMASK_FILE=../resources/repmask/hg19_alul1svaerv.rpmsk
REPEATMASK_FILE=../resources/repmask/alu_l1_herv_sva_other_grch38_ucsc.rpmsk
WARNING: User should unpack the compressed repmask resources (alu_l1_herv_sva_other_grch38_accession_ucsc.rpmsk.gz) during the install.
You can freely download an updated repmask file from the "http://genome.ucsc.edu/cgi-bin/hgTables". There are many output options, here are the changes that you'll need to make:
alu_l1_herv_sva_other_grch37_ucsc.rpmsk.tmp
or alu_l1_herv_sva_other_grch38_ucsc.rpmsk.tmp
. Then, you have to filter this file and keep only the lines with: