whitews / FlowKit

A Python toolkit for flow cytometry analysis supporting GatingML and FlowJo workspaces
https://flowkit.readthedocs.io
BSD 3-Clause "New" or "Revised" License
151 stars 36 forks source link

Exception generated when using apply_compensation() when channel is not compensation matrix #114

Closed krcurtis closed 2 years ago

krcurtis commented 2 years ago

Hi,

The function apply_compensation() returned an exception, looking for a channel that was not in the supplied compensation matrix. I used the compensation matrix given in the FCS metadata:

import flowkit as fk
# s = fk.Sample("mysample")
s.apply_compensation( s.get_metadata()["spillover"])

I assume CH1-A is forward or side scatter, and presumably that's why it was not in the spillover matrix

Expected behavior I thought that compensation would be applied to just the channels named in the spillover matrix (which was given in the single line FCS spill format: '21,IgA_...0.00500700017437,0.0416879989207,1"

Screenshots

Desktop (please complete the following information):

Additional context

ValueError Traceback (most recent call last) Input In [54], in <cell line: 1>() ----> 1 s.apply_compensation( s.get_metadata()["spillover"])

File ~/miniconda3-2020/envs/bcellflow/lib/python3.8/site-packages/flowkit/_models/sample.py:400, in Sample.apply_compensation(self, compensation, comp_id) 398 detectors = [self.pnn_labels[i] for i in self.fluoro_indices] 399 fluorochromes = [self.pns_labels[i] for i in self.fluoro_indices] --> 400 self.compensation = Matrix(comp_id, compensation, detectors, fluorochromes) 401 self._comp_events = self.compensation.apply(self) 402 else: 403 # compensation must be None so clear any matrix and comp events

File ~/miniconda3-2020/envs/bcellflow/lib/python3.8/site-packages/flowkit/_models/transforms/_matrix.py:46, in Matrix.init(self, matrix_id, spill_data_or_file, detectors, fluorochromes, null_channels) 44 spill = spill_data_or_file 45 else: ---> 46 spill = flowutils.compensate.parse_compensation_matrix( 47 spill_data_or_file, 48 detectors, 49 null_channels=null_channels 50 ) 51 spill = spill[1:, :] 53 self.id = matrix_id

File ~/miniconda3-2020/envs/bcellflow/lib/python3.8/site-packages/flowutils/compensate.py:210, in parse_compensation_matrix(compensation, channel_labels, null_channels) 206 else: 207 # may be a CSV string 208 matrix_text = compensation --> 210 matrix = _convert_matrix_text_to_array(matrix_text, fluoro_labels, fluoro_indices) 212 elif isinstance(compensation, Path): 213 fh = compensation.open('r')

File ~/miniconda3-2020/envs/bcellflow/lib/python3.8/site-packages/flowutils/compensate.py:100, in _convert_matrix_text_to_array(matrix_text, fluoro_labels, fluoro_indices) 97 label_diff = set(fluoro_labels).symmetric_difference(header) 99 # re-order matrix according to provided fluoro label order --> 100 idx_order = [header.index(fluoro_label) for fluoro_label in fluoro_labels] 101 matrix = matrix[idx_order, :][:, idx_order] 103 if len(label_diff) > 0:

File ~/miniconda3-2020/envs/bcellflow/lib/python3.8/site-packages/flowutils/compensate.py:100, in (.0) 97 label_diff = set(fluoro_labels).symmetric_difference(header) 99 # re-order matrix according to provided fluoro label order --> 100 idx_order = [header.index(fluoro_label) for fluoro_label in fluoro_labels] 101 matrix = matrix[idx_order, :][:, idx_order] 103 if len(label_diff) > 0:

ValueError: 'CH1-A' is not in list

Channels from sample: In [53]: s.channels Out[53]: channel_number pnn pns png pnr 0 1 Time 1.0 262144.0 1 2 CH1-A 1.0 262144.0 2 3 CH1-H 1.0 262144.0 3 4 CH2-A 1.0 262144.0 ...

whitews commented 2 years ago

I think what is happening here is that the FlowKit Sample class tries to figure out the fluoro channels by eliminating the scatter and time channels. The time channel is identified by "Time" of course, and the scatter channels are identified as any channel beginning with "SSC-" or "FSC-" (case-insensitive). However, in this case the channel names don't help to identify them.

One workaround for this is to set the fluoro_indices attribute manually. It's just a list of indices (zero-indexed) corresponding to the channel indices that are the actual fluorescent channels. Then the apply_compensation method should work without the error.

krcurtis commented 2 years ago

I can try that. Is there a helper routine to parse the FCS single line format that is exposed as part of the API?

krcurtis commented 2 years ago

Did you mean I should add a fluoro_indices keyword argument? fluoro_indices is not a keyword argument in flowkit 0.8.2, and I don't see it in the current version on GitHub in sample.py. I supposed I could generate a full identity matrix and populate it with the spillover matrix from the FCS single line format. Are the values in the FCS single line format given in row-major or column-major order?

whitews commented 2 years ago

Your sample instance will have it as an attribute. Say you know that the channel indices 5 - 8 are fluorescent channels, the pseudo-code would be like:

sample = fk.Sample("/path/to/flow.fcs")
sample.fluoro_indices = [5, 6, 7, 8]
krcurtis commented 2 years ago

That fixed my issue