scverse / pytometry

Flow & mass cytometry analytics.
https://pytometry.readthedocs.io/en/latest/index.html
Apache License 2.0
42 stars 9 forks source link

Numpy 2.x+ compatibility #80

Open Zethson opened 1 month ago

Zethson commented 1 month ago

Description of feature

Hi,

the CI and therefore users currently install numpy 1.26 because https://github.com/eyurtsev/fcsparser is not numpy 2.0.0 compatible and therefore readfcs isn't either.

What is the longterm plan here? https://github.com/eyurtsev/fcsparser is not maintained anymore according to the numpy 2.0.0 issue

grst commented 1 month ago

Maybe this is a reason to switch to @whitews's FlowIO?

There has been a lengthy discussion in https://github.com/scverse/pytometry/issues/47 about even moving flowutils to scverse, but it became stale at some point.

whitews commented 1 month ago

Hi Gregor and Lukas,

FlowIO is an essential dependency of FlowKit and as such is actively maintained. I've been tempted to add NumPy as a dependency to FlowIO but felt it was best to leave it with zero dependencies for situations like this. It's purpose is to read and write FCS files and that's what it does.

If you all do choose to use FlowIO, I'd point out one caveat. Since it reads the event data exactly as it is stored in the FCS file, any gain defined by a channel (in the metadata) is not applied. This pre-processing is rather straight-forward to implement and an example of doing so can be found in the FlowKit Sample class constructor. Or, you could just use FlowKit to get this functionality of the Sample class, the transforms implemented in FlowUtils and other features.

Hope this helps, Scott

Zethson commented 3 weeks ago

@mbuttner what do you think?

@whitews if Maren thinks that this is a good idea, would you be willing to make the changes in Pytometry to onboard FlowIO? You'd probably be best suited for this task. I can't comment on the whole preprocessing topic that you outlined and consider y'all the actual experts for that.

whitews commented 3 weeks ago

If Maren is onboard, I'll give it a shot.

mbuttner commented 3 weeks ago

@Zethson Lukas, thank you for bringing this up. I agree that onboarding to FlowIO for the read/write functions of FCS files is the most reasonable way forward.

@whitews Scott, as for your point whether to use FlowIO or Flowkit: I prefer to keep the dependencies lightweight and like your suggestion to use FlowIO and apply the gain when the anndata object of the data is generated.

So I'm absolutely onboard with this change! Please let me know if I can support you here further.

mbuttner commented 1 day ago

Hi @whitews any news from your side on the implementation? Please let me know.