taborlab / FlowCal

Python Flow Cytometry Calibration Library
MIT License
49 stars 23 forks source link

Remove `transform` module and update `FCSData` class and `mef` and `compensate` modules #344

Open JS3xton opened 3 years ago

JS3xton commented 3 years ago

After some reflection in #340, @castillohair and I agreed the transform module should be removed and code previously interfacing it should be simplified.

Tasks:

Unresolved issues:


Relevant discussion from #340:

@castillohair:

I've become skeptical about the need to have a dedicated module for "transformations". I think this came out of our old view that it was worth distinguishing between "channel" units and "a.u.", and therefore having a module that transformed between these two. But having worked with a lot of flow cytometry data, including data from more modern instruments which are stored directly in a.u., I started seeing channel units as an intermediate step that should not be used for anything. If present-day me had to remake FlowCal from scratch, I'd probably have FCSData objects be directly converted to a.u. upon loading, and eliminate the transform module, since we never used it for anything other than the to_rfi() function. That way FCSData objects from old and new instruments will be automatically in a.u., improving consistency.

@JS3xton:

I've been reflecting on the transform module. Some thoughts:

  • It's always hard for me to remember how the MEF transformation traces its way through the mef and transform modules. I would be in favor of simplifying its derivation and exposure to the user.
  • I think I originally thought there were going to be a lot more transformations we would want to support (e.g., log, logicle, etc.). In practice, those have largely manifested themselves in the plot module.
  • The transform module is still useful for FCSData bookkeeping (e.g., making a copy of the FCSData object and updating FCSData.range()). I could envision this functionality being absorbed into FCSData, though (e.g., via a FCSData.transform() function).
  • The transform module is also still useful for applying transforms to non-FCSData data (e.g., a numpy array) in a standardized way. I don't know how many users use non-FCSData data, though. Moreover, if transform.to_mef() and transform.to_compensated() were moved back to their respective modules, those functions could still be written to support non-FCSData arrays.
  • I agree transform.to_rfi() might make more sense as an internal processing step of FCSData and doesn't really need to be exposed to the user as overtly as it currently is.
  • The mef module currently kind of bends over backwards to interface with the transform module (e.g., by producing a transformation function via mef.get_transform_fxn()). If that driving rationale is removed, the primary mef module interface point could possibly be simplified. I'm not sure what form it (and compensate) should take to simplify them, though. (Do we still return transformation functions? Do the modules provide functions that operate directly on data, like gate module functions?)

Upon reflection, I currently favor removing transform, updating FCSData, and simplifying the common mef and compensate module interfaces.