stschiff / msmc2

GNU General Public License v3.0
53 stars 9 forks source link

Does bam file need to be deduplicated? #60

Closed enluo211 closed 6 months ago

enluo211 commented 7 months ago

First, Fastp/0.23.0 was used for quality control of the original sequencing data,Next, the B73v4 genome is used as the reference genome, BWA/0.7.17 is used for alignment, and finally Picard/2.1.1 is used to modify the header of the bam file. the script just like this:

!/bin/sh

prx=$1 module load fastp/0.23.0 module load BWA/0.7.17 module load picard/2.1.1-Java-1.8.0_92 module load Java/1.8.0_92 dir1=/public/home/eluo/sswu dir2=/public/home/eluo/sswu/TEO genome=/public/home/eluo/luoen/Zea_mays.AGPv4.dna.toplevel.fa cd $dir2

过滤

fastp -g -w 20 -l 150 -i ${dir1}/$prx.R1.fq.gz -I ${dir1}/$prx.R2.fq.gz -o ${dir2}/$prx.1.trimed.fq.gz -O ${dir2}/$prx.2.trimed.fq.gz -h ${dir2}/${prx}.html

比对

bwa mem -t 20 $genome $dir2/$prx.1.trimed.fq.gz $dir2/$prx.2.trimed.fq.gz | samtools sort -@20 -o $dir2/$prx.sorted.bam

java -Xmx30g -jar ${EBROOTPICARD}/picard.jar AddOrReplaceReadGroups I=$dir2/${prx}.sorted.bam O=$dir2/$prx.addrg.sort.bam RGID=$prx RGPU=unkn-0.0 RGLB=lib$prx RGSM=$prx RGPL=ILLUMINA samtools index $dir2/$prx.addrg.sort.bam

stschiff commented 6 months ago

Yes, you should generally deduplicate your BAM files, if that answers your question.

enluo211 commented 6 months ago

Yes, you should generally deduplicate your BAM files, if that answers your question.

Thank you,sir!