neurorestore / Augur

Cell type prioritization in single-cell data
MIT License
91 stars 10 forks source link

Feature_perc #20

Open amissarova opened 1 year ago

amissarova commented 1 year ago

Hi!

I was wondering about the rationale for feature_perc = 0.5 and not 1? Are any particular reasons to randomly select features (besides computational complexity)?

Thanks!

skinnider commented 1 year ago

Nope, just a way to reduce the runtime.

amissarova commented 1 year ago

cool, thanks

amissarova commented 1 year ago

related q: I just tried running augur with feature_perc = 1. I would have expected that for each gene, each subsampling and each fold I will now get importance score - but it is not the case (there are some subsampling where I dont have an input for this gene). Why? Thanks!

skinnider commented 1 year ago

By default 50% of genes will be filtered out with select_variance - are they there when setting var_quantile=0?

amissarova commented 1 year ago

Hey, I now set feature_perc = 1 and var_quantile = 0 --> for some genes, I still dont have an importance score entry for some of the subsamplings.

amissarova commented 1 year ago

Guess that possibly happens coz of the initial hard-coded filtering of genes with no variance (for given downsampling)? Or are there some other reasons?

skinnider commented 1 year ago

That seems plausible, yes.