shohei-kojima / MEGAnE

MEGAnE
MIT License
24 stars 2 forks source link

DOI

MEGAnE

MEGAnE, Mobile Element Genotyping Analysis Environment, identifies and genotypes polymorphic mobile elements from short-read whole genome shotgun sequencing data (WGS). The current version does not support whole exome sequencing data nor is it tuned to detect somatic polymorphisms. The initial release of MEGAnE officially supports human and mouse datasets. However, we designed MEGAnE to allow analysis of other species, if the end user provides a repeat library (e.g. consensus sequences from RepBase or Dfam).

MEGAnE (眼鏡 in Japanese) is pronounced like "mega" + "ne" as in "net." In Japanese, megane refers to a glass/lens that fine-tunes our vision enabling us to see something more clearly or understand truth.

Currently MEGAnE is beta version. Please use this version at your own risk.

Citation

Mobile elements in human population-specific genome and phenotype divergence
Shohei Kojima et al, bioRxiv 2022.03.25.485726; doi: https://doi.org/10.1101/2022.03.25.485726

Quick guide

Please see our Wiki page for further instruction and details.

Installation

MEGAnE can be available as docker and Singularity containers from dockerhub.
We highly recommend using such containers rather than preparing the required environment by yourself.

# build for Singlarity
sudo singularity build MEGAnE.sif docker://shoheikojima/megane:v1.0.0.beta
# or 
singularity build --fakeroot MEGAnE.sif docker://shoheikojima/megane:v1.0.0.beta

# build for Docker
docker pull docker://shoheikojima/megane:v1.0.0.beta

Input file

In-depth usage

Please see our Wiki page.

Quick usage for human WGS

Step 0. Prepare MEGAnE k-mer file

sif=/path/to/MEGAnE.sif

singularity exec ${sif} build_kmerset \
-fa /path/to/reference_human_genome.fa \
-prefix reference_human_genome \
-outdir megane_kmer_set

Step 1. Call and genotype polymorphic MEs

sif=/path/to/MEGAnE.sif

# In the case of BAM file mapping to GRCh37-related genome (e.g. hs37d5, human_g1k_v37, hg19)
singularity exec ${sif} call_genotype_37 \
-i /path/to/input.bam \
-fa /path/to/reference_human_genome.fa \
-mk /path/to/megane_kmer_set/reference_human_genome.mk \
-outdir MEGAnE_result_test \
-sample_name test_sample \
-p 4

# In the case of CRAM file mapping to GRCh38-related genome (e.g. GRCh38DH, hg38)
singularity exec ${sif} call_genotype_38 \
-i /path/to/input.cram \
-fa /path/to/reference_human_genome.fa \
-mk /path/to/megane_kmer_set/reference_human_genome.mk \
-outdir MEGAnE_result_test \
-sample_name test_sample \
-p 4

Step 2. Joint calling

sif=/path/to/MEGAnE.sif

# first, list up samples (output directories from Step 1) you are going to merge
ls -d /path/to/[all_output_directories] > dirlist.txt

# merge non-reference ME insertions
singularity exec ${sif} joint_calling_hs \
-merge_mei \
-f dirlist.txt \
-fa /path/to/reference_human_genome.fa \
-cohort_name test

# merge reference ME polymorphisms
singularity exec ${sif} joint_calling_hs \
-merge_absent_me \
-f dirlist.txt \
-fa /path/to/reference_human_genome.fa \
-cohort_name test

(Optional) Step 3. Make a joint call for haplotype phasing

sif=/path/to/MEGAnE.sif

singularity exec ${sif} reshape_vcf \
-i /path/to/jointcall_out/[cohort_name]_MEI_jointcall.vcf \
-a /path/to/jointcall_out/[cohort_name]_MEA_jointcall.vcf \
-cohort_name test

(Optional) Example haplotype phasing of MEGAnE's result

threads=6
memory=30000

# first, generate a SNP VCF
snp_vcf=/path/to/SNP.vcf chr=1
exclude=/path/to/vcf_for_phasing/cohort_name_biallelic.bed.gz
chr=1

plink2 \
--threads ${threads} \
--memory ${memory} \
--vcf ${snp_vcf} \
--make-pgen \
--max-alleles 2 \
--mac 2 \
--hwe 1e-6 \
--chr ${chr} \
--indiv-sort natural \
--exclude bed0 ${exclude} \
--export vcf bgz \
--out SNP

# next, generate a ME VCF
me_vcf=/path/to/vcf_for_phasing/[cohort_name]_biallelic.vcf.gz

plink2 \
--threads ${threads} \
--memory ${memory} \
--vcf ${me_vcf} \
--make-pgen \
--max-alleles 2 \
--mac 2 \
--hwe 1e-6 \
--chr ${chr} \
--indiv-sort natural \
--vcf-half-call missing \
--export vcf bgz \
--out ME

# merge SNP and ME VCFs
cat SNP.vcf.gz > _SNP_ME.vcf.gz
zcat ME.vcf.gz | grep -v '#' | pigz -c >> _SNP_ME.vcf.gz
zcat _SNP_ME.vcf.gz | pigz -c > SNP_ME.vcf.gz
rm  _SNP_ME.vcf.gz

plink2 \
--threads ${threads} \
--memory ${memory} \
--vcf SNP_ME.vcf.gz \
--make-pgen \
--sort-vars \
--vcf-half-call missing \
--out SNP_ME

plink2 \
--threads ${threads} \
--memory ${memory} \
--pfile SNP_ME \
--export vcf bgz \
--out SNP_ME_sorted

tabix SNP_ME_sorted.vcf.gz

# haplotype-phasing
map=/path/to/genetic_map

shapeit4 \
--input SNP_ME_sorted.vcf.gz \
--map ${map} \
--region ${chr} \
--thread ${threads} \
--output SNP_ME.phased.vcf.gz

Acknowledgement