scverse / scvi-tools

Deep probabilistic analysis of single-cell and spatial omics data
http://scvi-tools.org/
BSD 3-Clause "New" or "Revised" License
1.2k stars 344 forks source link

stereoscope implementation #851

Closed giovp closed 3 years ago

giovp commented 3 years ago

Hi @romain-lopez ,

following almaan/stereoscope#18, I finally got time to check out your implementation of stereoscope in romain/spatial branch. Don't have much to comment, it's great! I really like the decoupling between scmodel and spatial model, making it super convenient to fit one model for e.g. an atlas and reuse it multiple times for visium slides. In terms of default n_epochs, they made a lot of sense also for the data I tried it on. Maybe I would be more relaxed with the spatial model (I increased it to 7k) but of course arg very dependent to data/settings.

Do you still plan to merge it to master? I'm sure many would use it. Thank you again! 🎉

romain-lopez commented 3 years ago

Hi @giovp,

Thanks for your comment :+1:

We are actually working on a v1.0 version of scvi-tools, in which we will feature this reimplementation of stereoscope. The front end won't probably change but we are doing more work on the backend (@adamgayoso and @galenxing are working on the interaction with pytorch-lightning). Ideally, it would even be more straighforward to implement other models in the codebase.

For the specific case of stereoscope, I expect this feature will be merged to master in a month or so? @adamgayoso what do you think?

adamgayoso commented 3 years ago

Yeah a month sounds about right, though an official release might take a bit longer. Excited to have this method in the codebase!

cartal commented 3 years ago

Hi, this is great!

For the lesser mortals who have lots of CPU and time but no access to GPU, is there a way to run this without it? I somehow thought that by adding use_cuda = False I would override the GPU use.

At the moment I have tried to run this step:

spatial_model = stStereoscope(st_adata, params, use_cuda = False) spatial_model.train(lr = 0.01, n_epochs = 3000, train_size = 1, frequency = 5)

But get the following error:

INFO     Training for 3000 epochs                                                            
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-50-fdee515c64dc> in <module>
      1 spatial_model = stStereoscope(st_adata, params, use_cuda = False)
----> 2 spatial_model.train(lr = 0.01, n_epochs = 3000, train_size = 1, frequency = 5)

~/tools/scvi-tools/scvi/model/stereoscope.py in train(self, n_epochs, train_size, test_size, lr, frequency, train_fun_kwargs, **kwargs)
    201         if "lr" not in train_fun_kwargs:
    202             train_fun_kwargs["lr"] = lr
--> 203         self.trainer.train(**train_fun_kwargs)
    204         self.is_trained_ = True
    205 

~/tools/scvi-tools/scvi/core/trainers/trainer.py in train(self, n_epochs, lr, eps, params, **extras_kwargs)
    183         self.compute_metrics_time = 0
    184         self.n_epochs = n_epochs
--> 185         self.compute_metrics()
    186 
    187         self.on_training_begin()

~/miniconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py in decorate_no_grad(*args, **kwargs)
     47         def decorate_no_grad(*args, **kwargs):
     48             with self:
---> 49                 return func(*args, **kwargs)
     50         return decorate_no_grad
     51 

~/tools/scvi-tools/scvi/core/trainers/trainer.py in compute_metrics(self)
    149                                 computed.add(out_str)
    150                     for metric in self.metrics_to_monitor:
--> 151                         result = getattr(scdl, metric)()
    152                         out_str = metric + "_" + name
    153                         self.history[out_str] += [result]

~/tools/scvi-tools/scvi/core/data_loaders/scvi_data_loader.py in elbo(self)
    364             ind_x = tensors["ind_x"]
    365             labels = tensors[_CONSTANTS.LABELS_KEY]
--> 366             reconst_loss, local_kl, global_kl = self.model(sample_batch, labels, ind_x)
    367             log_lkl += torch.sum(reconst_loss + local_kl).item()
    368         n_samples = len(self.indices)

~/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

~/tools/scvi-tools/scvi/core/modules/ldeconv.py in forward(self, x, y, ind_x)
    315         """
    316         # Parameters for z latent distribution
--> 317         outputs = self.inference(x, ind_x)
    318         px_rate = outputs["px_rate"]
    319         px_o = outputs["px_o"]

~/tools/scvi-tools/scvi/core/modules/ldeconv.py in inference(self, x, ind_x)
    276 
    277         if self.use_cuda:
--> 278             w = w.cuda()
    279             px_o = self.px_o.cuda()
    280             eps = eps.cuda()

~/miniconda3/lib/python3.8/site-packages/torch/cuda/__init__.py in _lazy_init()
    194             raise RuntimeError(
    195                 "Cannot re-initialize CUDA in forked subprocess. " + msg)
--> 196         _check_driver()
    197         torch._C._cuda_init()
    198         _cudart = _load_cudart()

~/miniconda3/lib/python3.8/site-packages/torch/cuda/__init__.py in _check_driver()
     96         if torch._C._cuda_getDriverVersion() == 0:
     97             # found no NVIDIA driver on the system
---> 98             raise AssertionError("""
     99 Found no NVIDIA driver on your system. Please check that you
    100 have an NVIDIA GPU and installed a driver from

AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
romain-lopez commented 3 years ago

Hi!

Thanks for pointing this out. We are working through API changes that are pretty foundational for the back-end. I will address this in the next couple of weeks once Adam and Galen first merged these back-end changes (PyTorch ligthning!).

Happy that you're interested in using this :+1: It'll come out soon with the CPU version!

Best, Romain

romain-lopez commented 3 years ago

If you really want to run this, Google Colab should work perfectly though :)

cartal commented 3 years ago

This is super exciting, thank you!

romain-lopez commented 3 years ago

Some more progress on this, I have found that adding a per-cell scaling factor on the deconvolution model gives better performance on simulations (25 % improvement on MSE for predicting cell type proportion) AND speeds up the convergence rate (between 5x and 10x to achieve the local minimum of the objective function). These results are on a remote local branch, but we will have to discuss whether we want to provide the vanilla version or my slight modification on the codebase.

vitkl commented 3 years ago

Interesting, I also see improvement in accuracy of estimating cell proportions on simulated data when adding a per-location scaling factor (in cell2location). However, it breaks inference of absolute cell abundance on simulated data (which cell2location does as well as proportions, Fig 1E, https://www.biorxiv.org/content/10.1101/2020.11.15.378125v1.full), and on real data too. So I would be concerned with over-normalising by adding that parameter and breaking the mapping for some cell types. By the way, would be great if you could add cell2location to your benchmarks.

romain-lopez commented 3 years ago

That's interesting! thanks for mentioning that! I did not actually consider estimating the absolute cell abundance.

vitkl commented 3 years ago

I think it is not possible to do without informative priors because per-gene and per-location scaling factors are non-identifiable with the cell type abundance parameters. We also see a minor improvement when using absolute cell abundance as a predictor for detecting presence-absence of cell types (Fig 1C-type analysis, area 0.83 -> 0.84). But more importantly, you can do the downstream analysis on the estimates directly without usual challenges with proportional data.

vitkl commented 3 years ago

Hi Adam

One question about this implementation - is the inference amortized with encode NN? Sorry want able to determine this from code. Maybe you could mention in the documentation.

Thanks!

On Tue, 19 Jan 2021, 21:17 Adam Gayoso, notifications@github.com wrote:

Closed #851 https://github.com/YosefLab/scvi-tools/issues/851 via #852 https://github.com/YosefLab/scvi-tools/pull/852.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/YosefLab/scvi-tools/issues/851#event-4224688323, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMFTV3L62L4GIFKOXICVQLS2XZHLANCNFSM4TQ5KQPA .

romain-lopez commented 3 years ago

No it is not, this is an almost line-by-line reimplementation of stereoscope. Only differs a couple of minor points that we will detail in the docs. Thanks for pointing this out.

vitkl commented 3 years ago

Thanks for clarifying! It is quite easy to use so thank you for reimplementation!

adamgayoso commented 3 years ago

thanks @vitkl -- we might end up changing a few things to make it a tiny bit more user-friendly before official release.