Open Flu09 opened 2 months ago
Uh, couldn't you impute the value based on gene expression of genes on chrY
, excluding the pseudoautosomal regions? nCount
and nGene
could be associated to cellular biology. For instance, A cancer cell with whole genome duplication is going to have a whole lot more reads than a diploid cell. That is probably not unwanted variation to be adjusted for.
@DarioS That is a good idea thank you so much. What about batch as a variable ( because different runs or sequencing platforms). Do you think it makes sense or is it better to add biological ones only?
I have 3 questions, I am hoping you can help me. 1)In case I want to remove some latent variables such as Sex, but it is not available for all of my samples. The Sex for some samples is NA in the df in the metadata of my seurat object.
Do I still include the variable Sex in latent.vars or do I need remove the samples with the unknown variable Sex?
Do you suggest adding values for them as "unknown" instead of leaving them empty (NA) ? then proceed with including Sex as a variable or remove those samples?
2)in case I use something such as sample or donor does findmarkers() actually find the difference between the 2 idents only i specified disease vs control and cancel the effect of anything else such as sex or age or would it cancel everything?
3)I also want to ask about nCount and nGene as latent.vars() does it make sense to include them?