simonwm / tacco

TACCO: Transfer of Annotations to Cells and their COmbinations
BSD 3-Clause "New" or "Revised" License
41 stars 1 forks source link

Specify numpy versions (incompatible with numpy 2.0) #19

Open Rafael-Silva-Oliveira opened 1 week ago

Rafael-Silva-Oliveira commented 1 week ago

Hey, I'm trying to use this tool, but the environment file is not very specific on what numpy version to use and so it installs numpy 2.0 as default; However, there are pieces of code that have been since deprecated in numpy 2.0. Thins like np.array(..., copy=False) which are now np.asarray(...)

https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword

    a = np.asarray(a, dtype=np.float64)
ValueError: Unable to avoid copy while creating an array as requested.
If using `np.array(obj, copy=False)` replace it with `np.asarray(obj)` to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.

Might be worth adding the package versions to the environment file

JWatter commented 1 week ago

Hi,

thanks for hint! Numpy v2.0 does introduce breaking changes indeed and we will tackle this with the next tacco release. For now, the recommended way to install tacco is via conda/mamba using:

conda install -c conda-forge tacco

This should install a working tacco instance with numpy = 1.26.4

Hope this helps!

Rafael-Silva-Oliveira commented 1 week ago

Hi,

thanks for hint! Numpy v2.0 does introduce breaking changes indeed and we will tackle this with the next tacco release. For now, the recommended way to install tacco is via conda/mamba using:

conda install -c conda-forge tacco

This should install a working tacco instance with numpy = 1.26.4

Hope this helps!

Thank you @JWatter! Seems to work great now :) Also, any idea on how well TACCO could work with the new Visium HD data? I was testing it out using a HD lung cancer dataset from 10X Visium (8 micron resolution) and I got this with default parameters (cell types are not shown in this picture, but the red was classified as B cells from OT method):

image

As a comparision, heres the B cell markers (average of the log norms of the markers for B cell using 10X loupe browser):

Untitled

Also, very surprising on how fast it was! Comparing with methods like DestVI which took 2 hours to train for 15 epochs, this one was able to give the cell types for 600k bins in just 4 minutes. Perhaps due to it being so fast, I'm skeptical about how true the predictions are, but looking at the most of the cell markers (like the pictures above), the results seem to make sense! Just wondering if you have tested TACCO with 10x Visium HD and if there's any recommendations on how to properly deconvolute the cell types in the 8 micron resolution dataset

JWatter commented 1 week ago

Glad to see that TACCO seems to work nicely with Visium HD! We have not tested it on Visium HD yet, but while being an interesting new experimental method, we expect that for TACCO and most other compositional annotation methods Visium HD should in principle look just like a somewhat sparser regular Visium with more spots. And per spot you have more often just a single cell type than in regular Visium as the spatial resolution is higher. So one might expect that the compositional nature of Visium HD spots is actually not so important and basically any method that is able to get the majority cell type per spot correct is sufficient for the job.

But maybe the default parameter settings are not even the best and you might want to play around with them a little. One option might be to exploit this tendency to stronger unique celltypes per spot by lowering bisections from its default value maybe even to 0: That might still be sufficient to capture the majority signal per pixel and might make this annotation even faster (maybe in under a minute). Then you could even try more smaller bins like 2µm (expect like 16x pixels and 16x runtime compared to the 8µm case): One could expect that with higher spatial resolution the annotation should get more categorical but still remain smooth in space, until there is not enough data per bin to get the cell type right and the signal vanishes in noise. To force actual categorical results you can use the option max_annotation.

Anyway, as the default method OT runs so quickly, you can try out different settings rather fast and see what works best in your particular case. Either now or after getting a better feeling for the data after some down-stream analyses: If you lack a proper ground truth, the best metric which you have for judging the correctness of an annotation is whether it makes sense biologically.

While creating TACCO we invested a lot of work into making it really fast while not compromising on accuracy and flexibility (you might want to check out the paper and/or the benchmarking notebook in https://simonwm.github.io/tacco/notebooks/benchmarking.html ). So at least in our experience the TACCO results do make sense. But again, for Visium HD we have no experience.

Hope it helps nevertheless!

Rafael-Silva-Oliveira commented 1 week ago

Glad to see that TACCO seems to work nicely with Visium HD! We have not tested it on Visium HD yet, but while being an interesting new experimental method, we expect that for TACCO and most other compositional annotation methods Visium HD should in principle look just like a somewhat sparser regular Visium with more spots. And per spot you have more often just a single cell type than in regular Visium as the spatial resolution is higher. So one might expect that the compositional nature of Visium HD spots is actually not so important and basically any method that is able to get the majority cell type per spot correct is sufficient for the job.

But maybe the default parameter settings are not even the best and you might want to play around with them a little. One option might be to exploit this tendency to stronger unique celltypes per spot by lowering bisections from its default value maybe even to 0: That might still be sufficient to capture the majority signal per pixel and might make this annotation even faster (maybe in under a minute). Then you could even try more smaller bins like 2µm (expect like 16x pixels and 16x runtime compared to the 8µm case): One could expect that with higher spatial resolution the annotation should get more categorical but still remain smooth in space, until there is not enough data per bin to get the cell type right and the signal vanishes in noise. To force actual categorical results you can use the option max_annotation.

Anyway, as the default method OT runs so quickly, you can try out different settings rather fast and see what works best in your particular case. Either now or after getting a better feeling for the data after some down-stream analyses: If you lack a proper ground truth, the best metric which you have for judging the correctness of an annotation is whether it makes sense biologically.

While creating TACCO we invested a lot of work into making it really fast while not compromising on accuracy and flexibility (you might want to check out the paper and/or the benchmarking notebook in https://simonwm.github.io/tacco/notebooks/benchmarking.html ). So at least in our experience the TACCO results do make sense. But again, for Visium HD we have no experience.

Hope it helps nevertheless!

Thank you for the reply @JWatter ! So far TACCO has been the only tool (besides DestVI, but this one is super slow to train and my own implementation using a boosting algorithm) that has been the most accurate with the annotations when compared to the cell markers expression using Visium HD data (and the only one that really worked well to be fair, since most other deconvolution tools are not optimized to handle this much data).

Would be super cool if the OT method could also take in cell marker genes to guide the annotations as well! I've noticed that for example the fibroblasts cell types, OT method doesn't seem to be able to capture the same pattern that I see using the cell marker expression. So I think it would be quite interesting to see a future adaptation of OT with the cell markers for each cell type used as a "prior" of some sort. Is this something that is planned to be added in the future by any chance (or if even compatible with the method used) ? :)

Would also be quite nice to add a tutorial using Visium HD data since it's a very recent technology and I'm sure more people are looking for reliable tools to perform annotation on this data (and tacco seems to be quite decent at it)

An example of the boosting algorithm I'm developing (on the left) which is able to capture a similar pattern of the average of the cell markers expression (on the right): image

In the initial picture I posted of the OT annotation, "fibroblasts" are the big greenish mesh that you see; Not saying this is wrong (it can be very possible that the major cell type present is fibroblasts and the markers simply don't capture that well enough), but I was hoping to see a bit more of AT1 and AT2 cells (which are present in the reference dataset). This is where I think that the option of passing cell markers for each category to guide the annotations could improve the results even more and make TACCO the standard tool for single-cell spatial transcriptomics annotation.

Would also be quite nice to see TACCO part of scverse in the future (https://github.com/scverse/ecosystem-packages)!

Thanks!