scverse / scvi-tools

Deep probabilistic analysis of single-cell and spatial omics data
http://scvi-tools.org/
BSD 3-Clause "New" or "Revised" License
1.25k stars 352 forks source link

Support for weighted loss either per-batch or per-cell? #2785

Closed kamilkrukowski closed 3 months ago

kamilkrukowski commented 6 months ago

Could scVI models support weighed loss and weighing individual cells or batch_key groups?

Some potential uses

Comments To my knowledge, oversampling/undersampling is not a viable solution here because scvi-tools requires in-memory AnnData objects containing the entire dataset. Is there any way to "stream" data loading with meaningful oversampling that does not require holding all oversampled entries in memory for the entirety of training?

canergen commented 5 months ago

Hi, thanks for your question. @martinkim0 we wanted to have a stratified validation/train splitter at some time point but never followed up on this. I guess this would yield a very similar result. Currently you can subset your object before running scVI.

canergen commented 3 months ago

You can now used stratified train splitters in main (#2902, it will be released in scvi-tools 1.2). You can define a train set in which small cell-types are more abundant and will yield a similar increase of weights during training.