Error about the gatk MarkDuplicates using the possorted_bam.bam from cellrange-atac count

sunshine1126 commented 3 years ago

Hello, it suggested an error when the possorted_bam.bam from cellrange-atac count was used to gatk MarkDuplicates. It still displays an error using gatk MarkDuplicates after the possorted_bam.bam was sort by samtools. Can you help me resolve this problem? Thanks

seasoncloud commented 3 years ago

Hello! I also ran java -jar picard.jar MarkDuplicates directly on the possorted_bam.bam file from cellranger but did not have this issue. Could you share the codes you used to run MarkDuplicates? Thanks

sunshine1126 commented 3 years ago

Hello! I also ran java -jar picard.jar MarkDuplicates directly on the possorted_bam.bam file from cellranger but did not have this issue. Could you share the codes you used to run MarkDuplicates? Thanks

Hello! @seasoncloud, thanks for your reply. My code is as follows.

# install gatk (version 4.2.2.0)
conda create -n GATK4 gatk4
conda activate GATK4

inpath=~/data/scATAC-seq/analysis/cellrangeratac_count_results/G1/outs
out_path=~/data/scATAC-seq/analysis/mutation/hg38
sample_name=G1

#rm.dup
 echo "start MarkDuplicates for ${sample_name}"
gatk --java-options "-Xmx128G" MarkDuplicates \
     -I ${inpath}/possorted_bam.bam \
     -M ${out_path}/${sample_name}.possorted_rmdup_marked_dup_metics.txt \
     --VALIDATION_STRINGENCY SILENT \
     --REMOVE_DUPLICATES true\
     -O ${out_path}/${sample_name}.possorted_rmdup.bam

In addition, I have another question about the fa file when I ran gatk HaplotypeCaller. There would suggest an error if I used the fa file from the refdata-cellranger-arc-GRCh38-2020-A-2.0.0/fasta/genome.fa. Do I use the references from the gatk website？Thanks again!

seasoncloud commented 3 years ago

Not sure if the markduplicates issue is because of this, but I used java -jar picard.jar MarkDuplicates but not the gatk one. You could check more details here: https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-

For using the Haplotypecaller, it's better using the same version of .fa file as the one you used for the alignment (bam file). I think you used different version of reference genome (hg19) when you did the alignment.

seasoncloud / Basic_CNV_SNV

Error about the gatk MarkDuplicates using the possorted_bam.bam from cellrange-atac count #1