Data synchronisation example

RoboDoig commented 2 months ago

Summary from @ikharitonov

Steps: • Loading • Applying unit conversion to optical tracking sensor based on the datasheet • Loading the fluorescence signal as a dataframe • Selecting the rows where there are events occurring (e.g. 628 out of total 19327 rows) • Getting the values from 'Seconds' column of OnixDigital (total of 628 rows too, hence matching) and setting them to Fluorescence dataframe • Then, in order to interpolate between values (infer the HARP seconds timestamp for the entire fluorescence signal), I estimate the very first and the very last 'Seconds' value based on timestamps of the photometry software ('TimeStamp' column) • Apply default Pandas interpolation • Plot

Here are some points regarding this approach that I wanted to clarify with you:

1) The main assumption here is that a common kind of timestamp throughout the logs and data streams, which I understand to be the HARP timestamp. It's usually expressed in datetime format (e.g. 1904-01-06 01:46:41.470240) or in terms of seconds elapsed starting from the harp.REFERENCE_EPOCH, which is 1904-01-01T00-00-00. The second format is always found in the 'Seconds' column in the logs, such as ExperimentEvents, OnixAnalogFrameCount, OnixDigital, VideoData. The first format exists for HARP H1 and H2 data streams, for example the optical tracking sensor. This, in principle, makes it possible to align all of those, at least roughly. Do I think about this correctly?

2) My approach to align photometry might be quite inaccurate, because it requires extrapolating 'Seconds' values for all the samples, based on very few "anchor" values (corresponding to photometry events). This gets especially flaky when extrapolating for the first and the last 'Seconds' values, because 'TimeStamp' and 'Seconds' do not seem to have consistent inter-timestamp-periods. Do you think this is too off?

3) As we were discussing before, the more accurate approach is to use either the clock or photodiode data, which is supposed to have a much higher resolution. But we are still quite confused about how to use this. See the last section of the notebook where I plotted what we have now. The datatype and shape they are loaded as should be correct, based on our last July meeting and the docs https://open-ephys.github.io/onix-docs/Software%20Guide/Bonsai.ONIX/Nodes/AnalogIODevice.html

3.1) The main question is regarding alignment. What I understand is that each buffer (consisting of 100 samples in this example) of the photodiode signal (OnixAnalogData) is associated to a single row of the OnixAnalogFrameCount log file = a single HARP timestamp. Is the strategy then to treat each spike within the buffer (seen on the plots) as a higher resolution timestamp and extrapolate, similarly to what I did above, using the HARP timestamps from OnixAnalogFrameCount? Where exactly would the OnixAnalogFrameCount HARP timestamp fall on the buffer, its beginning or its end? Looking at OnixAnalogFrameCount, I've seen many cases of large numbers of rows having exactly the same HARP timestamp, why could this be?

3.2) We were discussing about the states of the photodiode with Nora, and we are assuming there should be three different states: gray, black and white. But this does not seem to be the case looking at data?

3.3) Regarding the OnixAnalogClock signal, there is a seemingly periodic pattern, which almost (but not quite) corresponds to the buffer size of 100 samples. How could we interpret/use it?

RoboDoig commented 2 months ago

@ikharitonov have a look at the notebook I added in PR #77

My approach here is first to create a conversion between ONIX clock and HARP clocks. Since the analog frame count has paired HARP and ONIX timestamps, we can use this to create a linear fit between the timestamps. Since the ONIX is running much faster than HARP, you'll often find that the pairing is not unique (e.g. the same HARP timestamp for multiple analog frame counts) but the linear fit should still work well over the full data for a sufficiently long recording.

Once we have this conversion, we can look at the timestamps for input events on the photometry which should correspond to digital events on ONIX. This gives us a pairing between ONIX time and photometry time and we can repeat the linear fit process.

When we have conversions from HARP-->ONIX and ONIX-->Photometry we can define timestamp conversion functions between all 3.

3.2 I think there are 3 states of the photodiode, but in practice the black state is very rarely seen because it corresponds to Idle. If you look at the full photodiode data, there are 3 levels but the lowest one only occurs right at the start - the rest of the time you are switching from white to grey (active visual environment to halt)

3.3 See the notebook in #77 - once you load and reshape analog data / analog clock you don't need to worry about the buffer size anymore, you have a Nx12 data array with N clock samples.

RoboDoig commented 2 months ago

Regarding 2. - For data streams with different clocks at some point you do have to rely on interpolation. If ONIX is running at 100000Hz and HARP is at 1Hz (not really, just for illustration) and you want to know what happened in ONIX between HARP times 1s-2s you need to do some interpolation.

For this to work we are making some assumptions which I think are fair. One is that the ONIX and photometry clocks are regular and linear enough that we can do the timestamp conversion by just linearly regressing matched timestamps (via digital trigger from ONIX to Photometry). The HARP clock itself should also be regular and linear, but we don't see it that way because we only get a timestamp from HARP when there is a Read, Write or Event message so the distance between timestamps we receive can be irregular. Luckily we have fairly regular messages from sensors like the optical flow sensor so we an also do a linear regression between ONIX and HARP.

For the latter point, where this might fail (and is something that needs to be checked) is if we get very few HARP messages (e.g. short recording) or synchronisation fails between the two devices. In both cases this could create a bad linear fit which would not be interpretable as a timestamp conversion.

Finally, since we are essentially doing HARP timestamping on ONIX in software there can be some millisecond scale error in the synchronisation, this should be solved by doing the hardware HARP timestamping that I'll implement next.

RoboDoig commented 2 months ago

Realised today I made a mistake by upsampling the HARP clocks when estimating the ONIX to HARP conversion. The upsampling produces a 'step' like plot which will bias the linear fit. Should instead downsample the analog clock to match the HARP times.

RoboDoig commented 2 months ago

Closing with PR #83

neurogears / vestibular-vr

Data synchronisation example #76