Closed BenxiaHu closed 2 years ago
Hi, Thank you for your interest in our tool!
binarized sparse matrix is the read count matrix of feature(peak)-by-cell with all the counts more than 1 being valued as 1, as such, this matrix is binarized and the values in the matrix are either 1 or 0.
Yes, it is slightly different than standard idf. This version of idf calculation is commonly used in dimension reduction of scATAC-seq and performs well, for example, Cusanovich2018 and Stuart2021. Actually, there are many variants in tf-idf calculation link, I think the performance will be slightly different although I do not have a comprehensive benchmark.
Hope this is helpful. Please let me know if you have any other questions, Thanks!
thanks a lot for your explanation. would you like to explain a little of how you build a nearest neighbor graph from the LSI matrix of N cells and d leading LSIs (d = 30)? in your paper, it seems that you did not mention how to obtain LSI matrix. maybe I miss some important steps. Best,
Hello Fulong, SCAVENGE is a good tool deciphering the function of genetic variant at single-cell level. I have 2 questions about the algorithm of SCAVENGE. 1: what is the binarized sparse matrix? 2: you used TF-IDF to calculate the weight for each feature. it seems that the IDF in your paper looks a little different from (log(N/(dfi+1))).