hackseq / 2016_project_6

Inferring sex chromosome and autosomal ploidy in NGS data
2 stars 1 forks source link

Realign 1000G data to GRCh38 #8

Open BrunoGrandePhD opened 7 years ago

mathbionerd commented 7 years ago

Here is the command used for the 1000 genomes high coverage whole genome from here:

bwa mem -t 1 -B 4 -O 6 -E 1 -M -R $rg_string $reference_fasta_file $fastq_file(1) $fastq_file(2) | k8 bwa-postalt.js -p $prefix_hla_hit $reference_fasta_file.alt | samtools view -1 - > $bam_file

Madelinehazel commented 7 years ago

Samples:

Plan:

  1. Align fastq files for the two samples using bwa mem (with -a option to output all possible alignments)
  2. Align fastq files for the two samples using mrFAST (by default, returns all possible alignments)
Madelinehazel commented 7 years ago

bwa mem command (for HG00419)

bwa mem -t 12 -B 4 -O 6 -E 1 -M -a \ -R "@RG\tID:HG00419\tPL:Illumina\tPU:TOTO2\tLB:TOTO\tSM:TOTO3" \ /mnt/data/GRCh38/GRCh38_full_analysis_set_plus_decoy_hla.fa \ /mnt/data/HG00419/HG00419_1_cat.filt.fastq.gz \ /mnt/data/HG00419/HG00419_2_cat.filt.fastq.gz>HG00419_bwamem.bam

BrunoGrandePhD commented 7 years ago

Illumina iGenomes: http://support.illumina.com/sequencing/sequencing_software/igenome.html

Madelinehazel commented 7 years ago

Downloaded hg38 from UCSC at http://support.illumina.com/sequencing/sequencing_software/igenome.html

Madelinehazel commented 7 years ago

Aligning NA20845 with bwa mem using UCSC hg38:

bwa mem -t 11 -B 4 -O 6 -E 1 -M -R "@RG\tID:NA20845\tPL:Illumina\tPU:TOTO2\tLB:TOTO\tSM:TOTO3" \ /mnt/data/UCSC_GRCh38/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/version0.6.0/genome.fa \ /mnt/data/NA20845/NA20845_1_cat.filt.fastq.gz /mnt/data/NA20845/NA20845_2_cat.filt.fastq.gz > NA20845_bwamem_UCSC.sam

And similarly for HG00419: bwa mem -t 12 -B 4 -O 6 -E 1 -M -R "@RG\tID:HG00419\tPL:Illumina\tPU:TOTO2\tLB:TOTO\tSM:TOTO3" \ /mnt/data/UCSC_GRCh38/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/version0.6.0/genome.fa \ /mnt/data/HG00419/HG00419_1_cat.filt.fastq.gz /mnt/data/HG00419/HG00419_2_cat.filt.fastq.gz > HG00419_bwamem_UCSC.sam

Madelinehazel commented 7 years ago

Sort and index bam files: samtools view -bS $SAM | samtools sort -o $BAM samtools index $BAM

Madelinehazel commented 7 years ago

extract X and Y reads, convert to fastq, separate into read1 and read2: `samtools view -b $BAM chrX > $BAM.X samtools view -b $BAM chrY > $BAM.Y

cat $BAM.X $BAM.Y > $BAM.XY samtools bam2fq $BAM.XY > $OUTFQ

cat $OUTFQ | grep '^@._/1$' -A 3 --no-group-separator > $BAMr1.fastq cat $OUTFQ | grep '^@./2$' -A 3 --no-group-separator > $BAM_r2.fastq`

Madelinehazel commented 7 years ago

reorder XY reads (with bbmap_36.49): /mnt/data/bbmap/repair.sh in1=$READ1 in2=$READ2 out1=$READ1.fix out2=$READ2.fix

Madelinehazel commented 7 years ago

realign XY reads to GRCh38 chrX: bwa mem -t 20 -B 4 -O 6 -E 1 -M \ -R "@RG\tID:$SAMPLE\tPL:Illumina\tPU:TOTO2\tLB:TOTO\tSM:TOTO3" \ $REF $READ1 $READ2 > $OUT

Madelinehazel commented 7 years ago

strip X and Y reads from original bam file: samtools idxstats $BAM | cut -f 1 | grep -v chrX | grep -v chrY | \ xargs samtools view -b $BAM > $OUT

Madelinehazel commented 7 years ago

scp'ed HG00419 (Female, CHS) chrX, chrY, chr19 bams (aligned by bwa-mem, see above) to /home/msayres.

scp'ed X and Y reads realigned to X chromosome to /home/msayres: -HG00419_bwamem_XY_realign.bam

The question is: for a female, does remapping of Y reads to X chromosome improve variant calling?

@Phillip-a-richmond

Madelinehazel commented 7 years ago

and finally, merging the realigned X reads to the stripped bam:

samtools merge $OUT $STRIPPEDBAM $XBAM