r-trimbour / atacnet

GNU Affero General Public License v3.0
3 stars 0 forks source link

Happy to have found this package! Would like to offer suggestions #1

Closed shahrozeabbas closed 1 month ago

shahrozeabbas commented 5 months ago

Hello!

I'm currently building a large scale atlas from single-cell atac data and was running into the issue of how would I be able to run Cicero in R because the dataset I have is so large. I was hoping to try this implementation, but also wanted to offer suggestions/ideas that I have regarding the implementation.

I realize the intended purpose is to replicate Cicero but potentially you could add to the algorithm for a substantial improvement?

First, regarding the metacells calculation, a recently published paper calculates metacells using archetype analysis and I wonder if using the result of this algorithm improves results further. Here is the link to the SEAcells algorithm.

Second, I wonder if the calculation of metacells could be avoided entirely by implementing FAVA for calculating correlations. It uses a VAE to embed the counts in latent space, which is used for computing correlations.

Third, if you are still planning to keep the implementation close to the original Cicero, would it be possible to code it such that the user is able to provide any low-dimensional representation of the data to compute metacells or co-accessibility. I think the original relies on UMAP/T-SNE but would be nice to supply PCA or PEAKVI embeddings for calculation.

In any case, so glad to have found this and look forward to seeing it developed further!

r-trimbour commented 5 months ago

Hello,

Thanks a lot for you nice comment, I'm glad to see this package could be useful for you ! :)

Addressing your 3 points:

  1. I didn't test yet the impact of different metacell calculation but one of the main motivation for this package was to update it with most recent methods yes ! I have been advice SNAP and SEAcells would be a great adding, thanks for the suggestion :) I am for few weeks focusing on other projects so I won't implement or test it directly, but I'm convinced metacells from such methods would give better results indeed.

  2. For latent space, it seems to me very interesting, but non-trivial at all to use for ATAC. Since we have distance constrain when calculating correlation between peaks, how would we calculate such distance if we don't compare pair of peaks but group of peaks ? It would then probably require also to constrain by distance the latent space construction, group only peaks close to each other. But I'm happy to hear any suggestion on this, that could be a great adding too !

  3. I'm currently trying to add different metacell strategies that would be easy to use through the package, but since it's in AnnData format, it should stay very easy to use any metacells calculated externally, and then work directly on this object. It's what I'm doing in the current comparison to Cicero, calculating metacells in Cicero then importing it in Atacnet. For the latent of variables, it's currently to the point 2), that would still require to have distance between pairs of (latent) variables or a specific extension of the algorithm.

I hope I answered your different points, thanks a lot for all these suggestions :) Rémi

r-trimbour commented 1 month ago

Hi ! I'm closing the issue since atacnet is now available as Circe at github.com/cantinilab/Circe. :)