kkdey / GSSG

Gene Set + S2G strategy annotations analyzed for disease architecture
45 stars 12 forks source link

data normalization and variable gene selection in NMF and joint NMF #21

Closed KoichiHashikawa closed 1 year ago

KoichiHashikawa commented 1 year ago

Hello @kkdey,

I really appreciate your significant contribution to the community and for sharing the fantastic sc-linker programs. I posted this question both here and in scgenetics as I am not sure which one is relevant.

I have following questions regarding NMF, especially some lines in "(https://github.com/karthikj89/scgenetics/tree/master/src/jointNMF)/jointNMF.py" and "(https://github.com/karthikj89/scgenetics/blob/master/src/1.generateNMFModules.ipynb)".

1) data normalization In the sc-linker paper's method section, the authors said log (normalized) expression data was used. But in "1.generateNMFModules.ipynb" the raw counts data (X) was normalized by max (X/np.max(X)). Similarly, in "jointNMF.py", similar normalization was used. self.Xh = scipy.sparse.csr_matrix(Xh).copy() self.Xd = scipy.sparse.csr_matrix(Xd).copy() self.Xh = self.Xh/np.max(self.Xh) self.Xd = self.Xd/np.max(self.Xd)

I think for correcting seq depth etc, log norm data should be used here. Which one is actually used? What is the rationale of the normalization used in your code?

2) data slicing In "1.generateNMFModules.ipynb", data matrix was further sliced by tissueadata = tissueadata[:,tissueadata.var['highly_variable']] I did not see this line in "jointNMF.py". Did you use all genes in jointNMF or did you slice it using variable genes?

3) Notebook version of jointNMF Like other gene program computation, could you also post the notebook version of the jointNMF? That will help the community a lot.

Thanks so much! Koichi

kkdey commented 1 year ago

@KoichiHashikawa This is something more related to the scgenetics part of the pipeline. I see you already posted there. I will inform Karthik to get back to you asap!

KoichiHashikawa commented 1 year ago

@kkdey Thanks! I really appreciate it!