lh3 / minimap2

A versatile pairwise aligner for genomic and spliced nucleotide sequences
https://lh3.github.io/minimap2
Other
1.81k stars 415 forks source link

Nanopore cDNA PCR data mapping #1237

Closed Jean497 closed 1 month ago

Jean497 commented 2 months ago

Hi, Recently, I am doing some analysis of Nanopore cDNA PCR data. When I processed NGS data, I used Picard to remove duplicates, so I also used picard to process ONT cDNA, and it turned out that most of the reads have been removed, so I don't know if picard is correct to process ONT cDNA PCR data. Here is the command line i used:

{minimap2} -t {processor} -ax splice --secondary=no --cs {ref} {out_name}.min120.Q12.fastq -o {out_name}.minimap2.cs.sam

{java} -jar {picard} MarkDuplicates -I {out_name}.minimap2.cs.unique.sort.bam -O {out_name}.minimap2.cs.sort.makdup.bam -M {out_name}_duplicate_metric --VALIDATION_STRINGENCY SILENT --TMP_DIR ./

Here is the results:

![Uploading Snipaste_2024-08-21_13-00-05.png…]()

$ samtools flagstats 293.minimap2.cs.sort.dedup.bam 13272796 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 0 + 0 supplementary 9213321 + 0 duplicates 13272796 + 0 mapped (100.00% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

I found that there are too many duplicates. I don't know whether picard is effective for ONT data. Does anyone have any experience in using other tools?

Would appreciate help!

Thanks, Jean

lh3 commented 1 month ago

This is irrelevant to minimap2. Don't deduplicate.