Closed castillohair closed 9 years ago
I did some tests with the data.001 file in the examples directory.
>>> import fc
>>> d = fc.io.TaborLabFCSData('examples/FCFiles/data.001')
According to http://onlinelibrary.wiley.com/doi/10.1002/cyto.990110303/abstract, BTIM and ETIM register the beginning and end acquisition times. http://murphylab.web.cmu.edu/FCSAPI/FCS3.html says that the same is true in FCS3.0.
>>> d.text['$BTIM']
'20:57:49'
>>> d.text['$ETIM']
'20:58:09'
>>> time.mktime(time.strptime(d.text['$ETIM'], '%H:%M:%S')) - time.mktime(time.strptime(d.text['$BTIM'], '%H:%M:%S'))
20.0
In addition, the 'Time' channel registers the time at which each event is detected.
>>> d[:,'Time']
TaborLabFCSData([ 0, 0, 0, ..., 388, 388, 388], dtype=uint16)
>>> max(d[:,'Time'])
388
However the units are not obvious. The FCS3.0 standard has the $TIMESTEP text attribute which indicates the value of each step in seconds, as a floating point number (http://murphylab.web.cmu.edu/FCSAPI/FCS3.html). However the FCS2.0 specification doesn't have a similar parameter (http://onlinelibrary.wiley.com/doi/10.1002/cyto.990110303/abstract). Some less reliable sources mention the TIMETICKS attribute, which specifies the value of each step in milliseconds (flowrepository.org/experiments/4/attachments/786/download). If we assume this to be true, then
>>> d.text['TIMETICKS']
'50'
>>> max(d[:,'Time'])*float(d.text['TIMETICKS'])/1000.
19.399999999999999
which is very similar to the value of 20 calculated before.
Considering that we will be using FCS3.0 files soon, I can think of three solutions to this:
In addition, I propose changing the "Gated Counts/millisecond" column by "Acquisition Time (s)". This will allow the user to calculate gated or ungated counts/second using standard excel functionality.
Opinions? @BrianLandry @thoreusc @JS3xton
I agree with everything that you've uncovered:
TIMETICKS
is a BD acquisition (CellQuest)-specific keyword (not in the standards) which can be trusted to convert the time signal to seconds for the FCS2.0 files.
TIMETICKS
will be deprecated, though, as we move to FlowJo FCS3.0 files (or for any other non-BD acquisition software).$TIMESTEP
keyword which can be used instead.I'm leery of (1), but it's probably good enough for first-order approximation (it's also the only option, as you mention, if you don't have a time
signal). Depends on what the user wants:
time
signal rather than relying on two points.# counts
v. time
and I don't always see a nice line (i.e. the event rate changes; you get kinks/elbows). Unclear what should be done here, depends on application. This could also be exacerbated (or alleviated) by gating.fc.gate.start_stop()
to throw out portions of the data.My first impression is that, philosophically, I don't like (2) and (3) because I feel like they bog down the FCS file objects, but the jury is still out for me.
time
signal, why aren't you also preprocessing the fluorescence channels? i.e. there should never be a "raw" numpy array with channel #s in it, it should only ever be MEFed data. If you don't do this, then you've arbitrarily decided that the time
signal is worth preprocessing, but the rest of the channels are not.I'm also in favor of just giving the acquisition duration in the Excel spreadsheet. This is a number that you are confident of; very few assumptions go into that number. Let the user make assumptions if they're comfortable with them and calculate their event rate on their own (do you also output total number of gated and ungated events? I don't know; haven't used it yet).
I believe the time
channel is different because it is the only quantity in which there is almost universal agreement about the appropriate units to use (unless you ask a hardcore physicist). As you said, I can't think of any reason to not use seconds or milliseconds for time information. However, I also don't like the idea of modifying the time channel right when the file is loaded. Maybe a transform function could be implemented to handle this?
I didn't consider fitting a line to # events vs time. I run some tests and got the following results Raw data:
events/sec (BTIM/ETIM method): 930.35
events/sec (events method): 959.12
events/sec (slope method): 964.77
Error in BTIM/ETIM method: 3.00%
Error in slope method: 0.59%
After start/end trimming
events/sec (BTIM/ETIM method): 912.85
events/sec (events method): 958.37
events/sec (slope method): 965.53
Error in BTIM/ETIM method: 4.75%
Error in slope method: 0.74%
After low/high gating
events/sec (BTIM/ETIM method): 859.75
events/sec (events method): 902.62
events/sec (slope method): 909.07
Error in BTIM/ETIM method: 4.75%
Error in slope method: 0.71%
After density gating (30%):
events/sec (BTIM/ETIM method): 257.95
events/sec (events method): 270.81
events/sec (slope method): 272.84
Error in BTIM/ETIM method: 4.75%
Error in slope method: 0.74%
Highlights:
In my opinion, the precision difference that results from using the slope method as opposed to the events method is negligible, and I'm not convinced that the slope method is theoretically more correct.
After checking @JS3xton comments and running some analysis, I propose the following changes:
time_scaling
property to the FCSData file. time_scaling
will read the appropriate text
field and return a value such that data[:,'Time']*data.time_scaling
should be in seconds. acquisition_time
property. This will be calculated using data.time_scaling*(data[-1, 'Time'] - data[0, 'Time'])
if Time
is a valid channel, otherwise it will use the BTIM/ETIM method.A transform function that converts time to seconds (for the lazy, like myself) may be implemented in the future.
I will wait a couple of hours for comments, otherwise I'll start implementing this.
@castillohair's proposed changes sound good to me. I like output acquisition time instead of a rate too.
Solved in e9ecffd4362481455cb8d8675527d21cc0ecd13c.
Values in column "Gated Counts/millisecond" in output excel file range from 7 to 13. This means that acquisition of these values was performed at 23000 to 43000 events/second (taking into account gating), which is unusual. My guess is that calculation of counts/millisecond is wrong.