After some reflection in #340, @castillohair and I agreed the transform module should be removed and code previously interfacing it should be simplified.
Tasks:
Remove transform module.
Modify FCSData to automatically transform data to RFI units.
Add FCSData.transform() function to support generic transformations.
Consolidate compensate.get_transform_fxn() and transform.to_compensated() and return all unmixed fluorescence signals.
Consolidate mef.get_transform_fxn() and transform.to_mef(). Update excel_ui accordingly.
Unresolved issues:
What form should the consolidated compensate function take?
What data structure(s) should be returned? (E.g., a numpy array, pd.DataFrame or FCSData object for each single-fluorophore control?)
Would a full_output flag (like with the gate module functions) provide better control over the amount of information returned?
Should the user be able to specify their own spillover matrix?
What form should the consolidated mef function take?
Relevant discussion from #340:
@castillohair:
I've become skeptical about the need to have a dedicated module for "transformations". I think this came out of our old view that it was worth distinguishing between "channel" units and "a.u.", and therefore having a module that transformed between these two. But having worked with a lot of flow cytometry data, including data from more modern instruments which are stored directly in a.u., I started seeing channel units as an intermediate step that should not be used for anything. If present-day me had to remake FlowCal from scratch, I'd probably have FCSData objects be directly converted to a.u. upon loading, and eliminate the transform module, since we never used it for anything other than the to_rfi() function. That way FCSData objects from old and new instruments will be automatically in a.u., improving consistency.
@JS3xton:
I've been reflecting on the transform module. Some thoughts:
It's always hard for me to remember how the MEF transformation traces its way through the mef and transform modules. I would be in favor of simplifying its derivation and exposure to the user.
I think I originally thought there were going to be a lot more transformations we would want to support (e.g., log, logicle, etc.). In practice, those have largely manifested themselves in the plot module.
The transform module is still useful for FCSData bookkeeping (e.g., making a copy of the FCSData object and updating FCSData.range()). I could envision this functionality being absorbed into FCSData, though (e.g., via a FCSData.transform() function).
The transform module is also still useful for applying transforms to non-FCSData data (e.g., a numpy array) in a standardized way. I don't know how many users use non-FCSData data, though. Moreover, if transform.to_mef() and transform.to_compensated() were moved back to their respective modules, those functions could still be written to support non-FCSData arrays.
I agree transform.to_rfi() might make more sense as an internal processing step of FCSData and doesn't really need to be exposed to the user as overtly as it currently is.
The mef module currently kind of bends over backwards to interface with the transform module (e.g., by producing a transformation function via mef.get_transform_fxn()). If that driving rationale is removed, the primary mef module interface point could possibly be simplified. I'm not sure what form it (and compensate) should take to simplify them, though. (Do we still return transformation functions? Do the modules provide functions that operate directly on data, like gate module functions?)
Upon reflection, I currently favor removing transform, updating FCSData, and simplifying the common mef and compensate module interfaces.
After some reflection in #340, @castillohair and I agreed the
transform
module should be removed and code previously interfacing it should be simplified.Tasks:
transform
module.FCSData
to automatically transform data to RFI units.FCSData.transform()
function to support generic transformations.compensate.get_transform_fxn()
andtransform.to_compensated()
and return all unmixed fluorescence signals.mef.get_transform_fxn()
andtransform.to_mef()
. Updateexcel_ui
accordingly.Unresolved issues:
compensate
function take?array
,pd.DataFrame
orFCSData
object for each single-fluorophore control?)full_output
flag (like with thegate
module functions) provide better control over the amount of information returned?mef
function take?Relevant discussion from #340:
@castillohair:
@JS3xton: