scverse / scvi-tools

Deep probabilistic analysis of single-cell and spatial omics data
http://scvi-tools.org/
BSD 3-Clause "New" or "Revised" License
1.17k stars 343 forks source link

Support for weighted loss either per-batch or per-cell? #2785

Open kamilkrukowski opened 2 months ago

kamilkrukowski commented 2 months ago

Could scVI models support weighed loss and weighing individual cells or batch_key groups?

Some potential uses

Comments To my knowledge, oversampling/undersampling is not a viable solution here because scvi-tools requires in-memory AnnData objects containing the entire dataset. Is there any way to "stream" data loading with meaningful oversampling that does not require holding all oversampled entries in memory for the entirety of training?

canergen commented 1 month ago

Hi, thanks for your question. @martinkim0 we wanted to have a stratified validation/train splitter at some time point but never followed up on this. I guess this would yield a very similar result. Currently you can subset your object before running scVI.