haowenz / chromap

Fast alignment and preprocessing of chromatin profiles
https://haowenz.github.io/chromap/
MIT License
192 stars 21 forks source link

[BUG] summary and log are confusing. #143

Open ming1211 opened 1 year ago

ming1211 commented 1 year ago

The summary file generated is as follows, I'm so confused as, why the unmaped is minus zero.(the barconde is empty)

barcode,total,duplicate,unmapped,lowmapq ,66319792,2753815,-54606110,12185252

and the following is the log:

Number of reads: 132639584. Number of mapped reads: 120925902. Number of uniquely mapped reads: 109487752. Number of reads have multi-mappings: 11438150. Number of candidates: 1347489938. Number of mappings: 120925902. Number of uni-mappings: 109487752. Number of multi-mappings: 11438150. Sorted, deduped and outputed mappings in 482.28s. ‘# uni-mappings: 107154670, # multi-mappings: 10900787, total: 118055457. Number of output mappings (passed filters): 105986835 Total time: 1821.25s.

my data is pair-end data, the numbers in the log seem to be reads not pairs, and the numbers in the summary seem to be pairs???and reads?? Another question is why the reads number of the final result is singular, not plural. mean that there are reads not in pairs?

Number of output mappings (passed filters): 105986835

My command is chromap --preset chip -t 12 --MAPQ-threshold 10 -x $chromap_index -r $genome -1 A_R1.fastq.gz -2 A_R2.fastq.gz --SAM -o A_chromap.sam --trim-adapters --summary A-summary

Thanks in advance.

Best, Ming

haowenz commented 1 year ago

@mourisl Can you take a look? Thanks!

mourisl commented 12 months ago

@ming1211 Sorry for the delayed reply. The summary should be with respect to the read pairs. I think the negative number in the summary is a bug if the output is in the SAM format. I will look into this issue. Thank you for reporting this bug.

mourisl commented 11 months ago

Hi @ming1211 , thank you for letting us know about the bug for the negative number. I think I've found the issue and it should be fixed in the li_dev5 branch. Could you please check out that branch and give it a try to see whether it works on your data? Thank you.

ming1211 commented 10 months ago

Thanks for your update!I tried, it's perfect now! And I got another question, when preset as ATAC, there are 2 parameters inside: --remove-pcr-duplicates --remove-pcr-duplicates-at-cell-level, So if deal with bulk-ATAC seq, will there be bad influence with --remove-pcr-duplicates-at-cell-level?

Thanks again! Ming

ming1211 commented 10 months ago

not single-cell.

mourisl commented 10 months ago

Since your data is bulk ATAC-seq data, you shall use --remove-pcr-duplicates. The -at-single-level is for scATAC-seq data, I guess it will give you the same results as --remove-pcr-duplicates on bulk data, but we never tested it.