MEGAnE, Mobile Element Genotyping Analysis Environment, identifies and genotypes polymorphic mobile elements from short-read whole genome shotgun sequencing data (WGS). The current version does not support whole exome sequencing data nor is it tuned to detect somatic polymorphisms. The initial release of MEGAnE officially supports human and mouse datasets. However, we designed MEGAnE to allow analysis of other species, if the end user provides a repeat library (e.g. consensus sequences from RepBase or Dfam).
MEGAnE (眼鏡 in Japanese) is pronounced like "mega" + "ne" as in "net." In Japanese, megane refers to a glass/lens that fine-tunes our vision enabling us to see something more clearly or understand truth.
Currently MEGAnE is beta version. Please use this version at your own risk.
Mobile elements in human population-specific genome and phenotype divergence
Shohei Kojima et al, bioRxiv 2022.03.25.485726; doi: https://doi.org/10.1101/2022.03.25.485726
Please see our Wiki page for further instruction and details.
MEGAnE can be available as docker and Singularity containers from dockerhub.
We highly recommend using such containers rather than preparing the required environment by yourself.
# build for Singlarity
sudo singularity build MEGAnE.sif docker://shoheikojima/megane:v1.0.0.beta
# or
singularity build --fakeroot MEGAnE.sif docker://shoheikojima/megane:v1.0.0.beta
# build for Docker
docker pull docker://shoheikojima/megane:v1.0.0.beta
-skip_unmapped
should be specified). -lowdep
option. Please see our Wiki page.
sif=/path/to/MEGAnE.sif
singularity exec ${sif} build_kmerset \
-fa /path/to/reference_human_genome.fa \
-prefix reference_human_genome \
-outdir megane_kmer_set
sif=/path/to/MEGAnE.sif
# In the case of BAM file mapping to GRCh37-related genome (e.g. hs37d5, human_g1k_v37, hg19)
singularity exec ${sif} call_genotype_37 \
-i /path/to/input.bam \
-fa /path/to/reference_human_genome.fa \
-mk /path/to/megane_kmer_set/reference_human_genome.mk \
-outdir MEGAnE_result_test \
-sample_name test_sample \
-p 4
# In the case of CRAM file mapping to GRCh38-related genome (e.g. GRCh38DH, hg38)
singularity exec ${sif} call_genotype_38 \
-i /path/to/input.cram \
-fa /path/to/reference_human_genome.fa \
-mk /path/to/megane_kmer_set/reference_human_genome.mk \
-outdir MEGAnE_result_test \
-sample_name test_sample \
-p 4
sif=/path/to/MEGAnE.sif
# first, list up samples (output directories from Step 1) you are going to merge
ls -d /path/to/[all_output_directories] > dirlist.txt
# merge non-reference ME insertions
singularity exec ${sif} joint_calling_hs \
-merge_mei \
-f dirlist.txt \
-fa /path/to/reference_human_genome.fa \
-cohort_name test
# merge reference ME polymorphisms
singularity exec ${sif} joint_calling_hs \
-merge_absent_me \
-f dirlist.txt \
-fa /path/to/reference_human_genome.fa \
-cohort_name test
sif=/path/to/MEGAnE.sif
singularity exec ${sif} reshape_vcf \
-i /path/to/jointcall_out/[cohort_name]_MEI_jointcall.vcf \
-a /path/to/jointcall_out/[cohort_name]_MEA_jointcall.vcf \
-cohort_name test
threads=6
memory=30000
# first, generate a SNP VCF
snp_vcf=/path/to/SNP.vcf chr=1
exclude=/path/to/vcf_for_phasing/cohort_name_biallelic.bed.gz
chr=1
plink2 \
--threads ${threads} \
--memory ${memory} \
--vcf ${snp_vcf} \
--make-pgen \
--max-alleles 2 \
--mac 2 \
--hwe 1e-6 \
--chr ${chr} \
--indiv-sort natural \
--exclude bed0 ${exclude} \
--export vcf bgz \
--out SNP
# next, generate a ME VCF
me_vcf=/path/to/vcf_for_phasing/[cohort_name]_biallelic.vcf.gz
plink2 \
--threads ${threads} \
--memory ${memory} \
--vcf ${me_vcf} \
--make-pgen \
--max-alleles 2 \
--mac 2 \
--hwe 1e-6 \
--chr ${chr} \
--indiv-sort natural \
--vcf-half-call missing \
--export vcf bgz \
--out ME
# merge SNP and ME VCFs
cat SNP.vcf.gz > _SNP_ME.vcf.gz
zcat ME.vcf.gz | grep -v '#' | pigz -c >> _SNP_ME.vcf.gz
zcat _SNP_ME.vcf.gz | pigz -c > SNP_ME.vcf.gz
rm _SNP_ME.vcf.gz
plink2 \
--threads ${threads} \
--memory ${memory} \
--vcf SNP_ME.vcf.gz \
--make-pgen \
--sort-vars \
--vcf-half-call missing \
--out SNP_ME
plink2 \
--threads ${threads} \
--memory ${memory} \
--pfile SNP_ME \
--export vcf bgz \
--out SNP_ME_sorted
tabix SNP_ME_sorted.vcf.gz
# haplotype-phasing
map=/path/to/genetic_map
shapeit4 \
--input SNP_ME_sorted.vcf.gz \
--map ${map} \
--region ${chr} \
--thread ${threads} \
--output SNP_ME.phased.vcf.gz