Closed kamilkrukowski closed 3 months ago
Hi, thanks for your question. @martinkim0 we wanted to have a stratified validation/train splitter at some time point but never followed up on this. I guess this would yield a very similar result. Currently you can subset your object before running scVI.
You can now used stratified train splitters in main (#2902, it will be released in scvi-tools 1.2). You can define a train set in which small cell-types are more abundant and will yield a similar increase of weights during training.
Could scVI models support weighed loss and weighing individual cells or batch_key groups?
Some potential uses
Comments To my knowledge, oversampling/undersampling is not a viable solution here because scvi-tools requires in-memory AnnData objects containing the entire dataset. Is there any way to "stream" data loading with meaningful oversampling that does not require holding all oversampled entries in memory for the entirety of training?