neurorestore / Augur

Cell type prioritization in single-cell data
MIT License
94 stars 10 forks source link

Question about select_variance() #29

Open kleejin opened 10 months ago

kleejin commented 10 months ago

Hi there. I have been exploring how Augur is filtering out genes and therefore looking more closely at the select_variance function. The reason I wound up on this track is because one of the cell types I was expecting to be rankly highly by Augur (based on previous DE gene analysis) was not performing well, and when I dug into the output of Augur, it looked like the most significant DE genes from previous analysis were getting filtered out using the default var_quantile 0.5 threshold. So I've been trying to understand why.

Upon looking more closely at the select_variance() function, it looks like in the comments you intend to calculate CV (sds / mean), but in the actual code, you are calculating a form of Z-score (mean / sds). If this is what you are meaning to calculate, can you explain a little more conceptually how this is filtering out the genes with the lowest variance? Using (sds / mean) makes more sense to me, but perhaps I am misunderstanding the goal of this step.

Thank you for maintaining this tool and thank you for your help!

AlanTeoYueYang commented 8 months ago

Hi @kleejin, thank you for your comment and sorry for the delay getting back to you. I looked into the issue and you are indeed correct that this was a bug. Having said that, the bug seems to have little to no effect on the relative rankings within the cell type prioritisation although it does increase the AUC globally, which is not surprising. The bug fix you suggested will be implemented in an upcoming release or if you want, we can push the fix to a branch or you can fix locally and reinstall with devtools.

We implemented the fix and applied it on a dataset (Kang et al 2018) mentioned in the paper (shown below) image