r3fang / SnapATAC

Analysis Pipeline for Single Cell ATAC-seq
GNU General Public License v3.0
301 stars 125 forks source link

inconsistency between raw and binary bmat matrix #69

Closed znavidi closed 5 years ago

znavidi commented 5 years ago

Hi,

I analyzed a real scATACseq sample and wrote its raw bmat matrix to a file before converting it to binary. Then I run x.sp = makeBinary(x.sp, mat="bmat") function and again wrote that to a new file. The problem is that when I read the files in python code and convert each element in the raw cell to bin matrix which is greater than 0 to 1, the sum of rows in this new matrix and sum of rows in the binary cell to bin matrix does not exactly match. I can not understand the reason. is there any filtering step in the makeBinary function? I would appreciate your help.

Anyway this is the python code that I compare the raw and binary bmat matrix:

import numpy as np bin = np.load('raw_bmat_file') raw = np.load('binary_bmat_file') raw_to_bin = (raw>0).astype(np.uint8) np.all(raw_to_bin == bin) # the output is False!

Best, Zeinab

znavidi commented 5 years ago

I would appreciate if you have any idea what is the reason.

thanks

r3fang commented 5 years ago

The raw count is counted before filtering the duplicates, erroneous alignments etc Each elements in the cell-by-bin matrix is the number of fragments overlapping with this bin. So it is possible that one fragment can span more than one bin, it will be counted twice

-- Rongxin Fang, Ren Lab Ludwig Cancer Research Bioinformatics Ph.D. Student University of California, San Diego

On Aug 23, 2019, at 2:25 PM, znavidi notifications@github.com wrote:

I would appreciate if you have any idea what is the reason.

thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/r3fang/SnapATAC/issues/69?email_source=notifications&email_token=ABT6GG45XFX25ZLKKIZNXK3QGBIWHA5CNFSM4IOIFOR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5BL2BY#issuecomment-524467463, or mute the thread https://github.com/notifications/unsubscribe-auth/ABT6GG2R6TRWEE7ORLOQGFTQGBIWHANCNFSM4IOIFORQ.

znavidi commented 5 years ago

I might asked my question ambiguously! basically the code is: ######################### x.sp = createSnap( file="GSE96769.snap", sample="lable", num.cores=1 ) x.sp = addBmatToSnap(x.sp, bin.size=5000) raw <- apply(x.sp@bmat[1:100,], 1, function(c)sum(c!=0)) head(raw) # [1] 5155 6080 6520 2578 5486 6422

x.sp = makeBinary(x.sp, mat="bmat") bin <- Matrix::rowSums(x.sp@bmat) head(bin) # [1] 5150 6076 6515 2575 5483 6417 ########################## and as you can see just by running makeBinary function, the number of non zero elements in the cell by bin matrix changes a bit like 5, 4, 5, 3, 3, 5, ... do you know what is the reason? is it what you explained?!

I also have a question: after creating snap file and binarizing it there might exist some barcode rows with no read count, right? (like all the elements in that row are zero)

Best, Zeinab

r3fang commented 5 years ago

and as you can see just by running makeBinary function, the number of non zero elements in the cell by bin matrix changes a bit like 5, 4, 5, 3, 3, 5, ... do you know what is the reason? is it what you explained?!

I don’t see anything wrong with this

I also have a question: after creating snap file and binarizing it there might exist some barcode rows with no read count, right? (like all the elements in that row are zero)

It’s possible but highly unlikely. After you filter low-quality barcodes, each barcode usually has at lease 1000 fragments, this is very unlikely, almost impossible.

-- Rongxin Fang, Ren Lab Ludwig Cancer Research Bioinformatics Ph.D. Student University of California, San Diego

On Aug 23, 2019, at 2:53 PM, znavidi notifications@github.com wrote:

I might asked my question ambiguously! basically the code is: ######################### x.sp = createSnap( file="GSE96769.snap", sample="lable", num.cores=1 ) x.sp = addBmatToSnap(x.sp, bin.size=5000) raw <- apply(x.sp@bmat[1:100,], 1, function(c)sum(c!=0)) head(raw) # [1] 5155 6080 6520 2578 5486 6422

x.sp = makeBinary(x.sp, mat="bmat") bin <- Matrix::rowSums(x.sp@bmat) head(bin) # [1] 5150 6076 6515 2575 5483 6417 ########################## and as you can see just by running makeBinary function, the number of non zero elements in the cell by bin matrix changes a bit like 5, 4, 5, 3, 3, 5, ... do you know what is the reason? is it what you explained?!

I also have a question: after creating snap file and binarizing it there might exist some barcode rows with no read count, right? (like all the elements in that row are zero)

Best, Zeinab

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/r3fang/SnapATAC/issues/69?email_source=notifications&email_token=ABT6GG3HCSAJHH7ALSFKY5DQGBL5TA5CNFSM4IOIFOR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5BNN6A#issuecomment-524474104, or mute the thread https://github.com/notifications/unsubscribe-auth/ABT6GG6LL3AZ3HYARRNICLTQGBL5TANCNFSM4IOIFORQ.

znavidi commented 5 years ago

if you just binarize the matrix, the number of non zero elements should remain the same across each row, right? but it changes a bit. I could not understand why!

This is strange! because I see the last 5 rows are completely 0, while analyzing GSE96769! exactly like the steps in your website. do you have any idea?

Best, Zeinab

r3fang commented 5 years ago

the binarization is not just simply changing count matrix to binary, it also removes items that have very high coverage. That could be one reason but I can not be sure about it.

On Aug 23, 2019, at 3:02 PM, znavidi notifications@github.com wrote:

if you just binarize the matrix, the number of non zero elements should remain the same across each row, right? but it changes a bit. I could not understand why!

This is strange! because I see the last 5 rows are completely 0, while analyzing GSE96769! exactly like the steps in your website. do you have any idea?

Best, Zeinab

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/r3fang/SnapATAC/issues/69?email_source=notifications&email_token=ABT6GGYYR25APRJON6C4JE3QGBM7LA5CNFSM4IOIFOR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5BN6FA#issuecomment-524476180, or mute the thread https://github.com/notifications/unsubscribe-auth/ABT6GG5FFX6I6CCTTVN5HMLQGBM7LANCNFSM4IOIFORQ.

znavidi commented 5 years ago

that makes sense thank you :)