vals / Blog

37 stars 9 forks source link

batch effects #3

Open wangjiawen2013 opened 5 years ago

wangjiawen2013 commented 5 years ago

Dear, in the Lukassen 2018 data, batch1 and batch2 do not align well using DCA (DCA on Lukassen.ipynb), while it seems to align the two mice quite well with scvi (scvi on Lukassen.ipynb)! which one should I use ?

vals commented 5 years ago

Hi Jiawen,

Romain told me the reason it aligns well without batch correction in scVI is probably due to a size factor scaling scVI does.

I havn't used DCA much since the paper came out, but I use scVI almost every day. I don't remember if DCA has batch correction methods built in, but this is a feature of scVI that I find works very well.

wangjiawen2013 commented 5 years ago

I am newcomer of scVI. I notice that your scvi pipeline is different from that of scVI basic tutorial (https://github.com/YosefLab/scVI/blob/master/tests/notebooks/basic_tutorial.ipynb). what's the difference ? Do you make any customized improvements to obtain better results ?

vals commented 5 years ago

How do you mean? The only differences I can think of is that I store data in AnnData objects rather than GeneDatasets, and I use a different library for tSNE visualization.

wangjiawen2013 commented 5 years ago

I mean the pipeline in this link "https://github.com/vals/Blog/tree/master/180420-scrna-autoencoders". In "https://github.com/vals/Blog/blob/master/181004-integrating-cortex-data/Integrate%20frontal%20cortex%20data.ipynb", the pipeline is the same as that in scVI basic tutorial.

vals commented 5 years ago

Oh the post from last April used an old version of scVI that is deprecatred.

wangjiawen2013 commented 5 years ago

Dear, do you know when to use gene/gene-batch/gene-label/gene-cell as the "param dispersion" in VAE ?

:param dispersion: One of the following
    * ``'gene'`` - dispersion parameter of NB is constant per gene across cells
    * ``'gene-batch'`` - dispersion can differ between different batches
    * ``'gene-label'`` - dispersion can differ between different labels
    * ``'gene-cell'`` - dispersion can differ for every gene in every cell
vals commented 5 years ago

Hi,

I typically use gene-batch because I have noticed when analyzing data in general that the overdispersion trend when plotting mean-vs-variance for genes per batch it tends to be different per batch.

I haven't used the supervised mode of scVI much, so can't comment on the effect of gene-label. And the gene-cell option is interesting, but I haven't tried it much.