scverse / pytometry

Flow & mass cytometry analytics.
https://pytometry.readthedocs.io/en/latest/index.html
Apache License 2.0
41 stars 10 forks source link

Update FlowSOM dependency #73

Open mbuttner opened 3 months ago

mbuttner commented 3 months ago

Hi everyone,

following a brief discussion with @burtonrj in #71: There is a Python implementation of FlowSOM by the original authors (https://github.com/saeyslab/FlowSOM_Python), which offers a comprehensive functionality of FlowSOM clustering and effectively carried over the functionality of the FlowSOM R package. It depends on scVerse packages like pytometry and MuData. The pytometry package currently uses @burtonrj's implementation of FlowSOM, which depends on the packages miniSOM and consensusclustering. Hence, we have two parallel implementations here, where efforts could be more integrated, and second, we would like to reduce the number of dependencies in pytometry (see #64) as part of the governance strategy.

Possible actions

@burtonrj suggested to

  1. remove the FlowSOM functionality from pytometry and therefore
  2. remove the dependencies consensusclustering and minisom.
  3. Instead, point to the FlowSOM implementation of @saeyslab and add documentation accordingly.

As a perspective, one should start a discussion about the integration of the FlowSOM package in the scverse.

I am happy with this suggestion in general and like to suggest some modifications to provide continuity for all users who are already using the current FlowSOM implementation in pytometry:

  1. Make consensusclustering and minisom optional dependencies in the next version.
  2. Move examples for FlowSOM to consensusclustering and replace current example with a pointer to the FlowSOM python package.

I'd like to hear @grst and @quentinblampey thoughts on this.

grst commented 3 months ago

ping @berombau

I like the idea of using the official FlowSOM, but I'd be curious how it compares to @burtonrj's implementation in terms of speed.

We could keep a wrapper in pytometry for visibility and backwards-compatibility that requires flowSOM as an optional dependency. Refering to it (including some of it's nice visualization) in the tutorial/documentation sounds good to me.

quentinblampey commented 3 months ago

Since FlowSOM_Python is implemented by the original authors and integrated within the scverse ecosystem, it makes sense to use it. The solution proposed by @grst sounds good to me!

And it would also be great to check if the results are consistent for the two packages, but this requires quite some work.

berombau commented 3 months ago

Hi everyone, thank you for the discussion. We're ok with these proposed actions. The flowsom package itself depends on pytometry currently for a function normalize_estimate_logicle, but I'll try to make this and other dependencies optional so you can easily integrate flowsom with minimal dependencies. The MuData dependency is minimal and will probably stay.

The current implementation depends on Numba for speed. There is ongoing work on a batched SOM training update that would further increase parallelization, which we hope to conclude by the summer.

Alternative versions can reuse the scverse integration and visualizations of our package by implementing flowsom.models.BaseFlowSOMEstimator. It's even possible to mix-and-match the models for overclustering and metaclustering, but this is mostly for benchmarking. We can add additional models if that would provide better continuity for users. We can try to make it as consistent as possible, but this is indeed not that trivial. Sometimes there are slight differences and a previous analysis will not be fully reproducible. It's easier to work with containers or an older package version then.

berombau commented 2 months ago

I added some of these changes in https://github.com/saeyslab/FlowSOM_Python/tree/interop-pytometry in preparation for a 0.0.2 version. The pytometry package is now an additional install as explained in the notebook. We do require the 0.1.5 version not yet released on PyPI https://github.com/scverse/pytometry/issues/69.

mbuttner commented 2 months ago

Hi @berombau thank you for the update. My PyPI account is still not operational and there has been no response to my account recovery request from PyPI in the past two months. I keep working on it!

mbuttner commented 1 month ago

Hi @berombau I recovered access to my PyPI account and uploaded pytometry version 0.1.5: https://pypi.org/project/pytometry/0.1.5/

berombau commented 1 month ago

So in FlowSOM_Python we need the 0.1.5 version for the pytometry function normalize_autologicle. A PyPI installation still does not work because of the pandas issue, which is now fixed in https://github.com/burtonrj/consensusclustering/issues/1 and pytometry version 0.1.6.dev5. So with a PyPI 0.1.6 release, I think the installation issue will be resolved.