r3fang / SnapATAC

Analysis Pipeline for Single Cell ATAC-seq
GNU General Public License v3.0
300 stars 125 forks source link

Bam remove duplicate #140

Closed TheSallyGardens closed 4 years ago

TheSallyGardens commented 4 years ago

Hi. I have finished cellrange-atac pipeline,Is it necessary to remove duplication for BAMfile?

I know that snapATAC do not remove duplication.

r3fang commented 4 years ago

Hi,

SnapATAC does remove duplicates when running snaptools pre but you do not need to remove duplicates before snaptools pre.

TheSallyGardens commented 4 years ago

Hi,

SnapATAC does remove duplicates when running snaptools pre but you do not need to remove duplicates before snaptools pre. Thank you for your answer! What software do you use to remove duplicates? Does snapATAC have statistically duplicated information?

r3fang commented 4 years ago

Yes,

SnapTools removes duplicates for each cell separately. In a given cell, two fragments are considered to be duplicate if their gnomic position and strand information is identical. The duplicate information will be stored in the meta data inside the resulting snap file. You can access the quality of each barcode in SnapATAC:

library(SnapATAC)
x.sp = createSnap("your_snap_file.snap", sample="your_snap_file")
x.sp@metaData
##            barcode  TN UM PP UQ CM
## 1 AAAAAAAAAAAAAAAA 102 44 44 44  0
## 2 AAAAAAAAAAAAAAAC  10  6  6  6  0
## 3 AAAAAAAAAAAAAAAG   3  0  0  0  0
## 4 AAAAAAAAAAAAAAAT   2  1  1  1  0
## 5 AAAAAAAAAAAAAACA   6  1  1  1  0
## 6 AAAAAAAAAAAAAACC   1  1  1  1  0

The duplicate ratio can be calculated as:

1 - (x.sp@metaData[,"UQ"] +1) / ( x.sp@metaData[,"PP"] + 1)
TheSallyGardens commented 4 years ago

Yes,

SnapTools removes duplicates for each cell separately. In a given cell, two fragments are considered to be duplicate if their gnomic position and strand information is identical. The duplicate information will be stored in the meta data inside the resulting snap file. You can access the quality of each barcode in SnapATAC:

library(SnapATAC)
x.sp = createSnap("your_snap_file.snap", sample="your_snap_file")
x.sp@metaData
##            barcode  TN UM PP UQ CM
## 1 AAAAAAAAAAAAAAAA 102 44 44 44  0
## 2 AAAAAAAAAAAAAAAC  10  6  6  6  0
## 3 AAAAAAAAAAAAAAAG   3  0  0  0  0
## 4 AAAAAAAAAAAAAAAT   2  1  1  1  0
## 5 AAAAAAAAAAAAAACA   6  1  1  1  0
## 6 AAAAAAAAAAAAAACC   1  1  1  1  0

The duplicate ratio can be calculated as:

1 - (x.sp@metaData[,"UQ"] +1) / ( x.sp@metaData[,"PP"] + 1)

Thanks!