todlib loads more data than the smurf dat file

yuhanwyhan commented 2 years ago

for a 3 mins tod data at 200Hz sample rate,

for example: (smurf-srv15: /data/smurf_data/20220508/crate1slot2/1651973602/outputs/1652039708.dat)

the original smurf dat file has 36022 data for a given channel

timestamp, phase, mask, tes_bias = S.read_stream_data(noisefile,return_tes_bias=True) bands, channels = np.where(mask!=-1) len(phase[2]) gives 36022

the equivalent sotodlib gives 36066

data_g3 = ls.load_file(['/data/timestreams/16520/crate1slot2/1652039711_000.g3' ]) len(data_g3.timestamps) gives 36066

mhasself commented 2 years ago

This is probably caused by how the Smurf software works, and is beyond our control -- @jlashner should comment.

If there is an intrinsic problem with those 44 extra samples (e.g. they contain invalid readings), that is a concern. But if it's just a matter of the G3 files having a few more valid data than the .dat files, that might be something we just need to live with.

yuhanwyhan commented 2 years ago

Understood, I have one question about this. Assume I use the HWP signal to demodulate the G3 file, I assume if the 44 extra samples uniformly distribute in this 3 mins, the demodulation will still work but will leave an error bar. But is there any chance this 44 extra data is at the beginning or the end of the 3 mins, then the demodulation will be off right?

I have seen G3 file include more data at the beginning that are not physical, for example pton: smurf-srv15: '/data/timestreams/16426/crate1slot4/1642601798_000.g3' (this data also lives on simons1)

is this a concern?

skhrg commented 2 years ago

I have seen that giant spike at the start of our observations before and have always assumed that it was some byproduct of starting to take detector data.

mhasself commented 2 years ago

Understood, I have one question about this. Assume I use the HWP signal to demodulate the G3 file, I assume if the 44 extra samples uniformly distribute in this 3 mins, the demodulation will still work but will leave an error bar. But is there any chance this 44 extra data is at the beginning or the end of the 3 mins, then the demodulation will be off right?

I think we'll find that the extra 44 samples are at the beginning and/or end of the acquisition, not somehow interleaved.

In any case, HWP (and pointing) are not a concern because are using timestamps to combine the data streams. We have no reason to doubt the timestamps in the frames.

I have seen G3 file include more data at the beginning that are not physical, for example pton: smurf-srv15: '/data/timestreams/16426/crate1slot4/1642601798_000.g3' (this data also lives on simons1)

is this a concern?

This, to me, is a bigger concern. It will be annoying if we have to always drop the first few samples from .g3 files. I'd be interested to hear if this is often a problem (it's quite possible things have improved since January when your example was acquired).

yuhanwyhan commented 2 years ago

I still have my concern about timestamps, as I have found the timestamp on the dat file name does not agree with the G3 file name. However I have not looked into the timestamp more carefully to make sure it is not just a fine name issue. For example smurf dat file 1652039708, the right g3 file name is 1652039711_000.g3

mhasself commented 2 years ago

If one looks at the previous acquisition (16425/crate1slot4/1642579920_003.g3), it ends with:

>>> tod0.signal[10,-10:]
array([2.7077637, 2.7070928, 2.7078598, 2.7080514, 2.7080514, 2.7072845,
       2.7071886, 2.707572 , 2.7080514, 2.7080514], dtype=float32)

If you look at the start of the acquisition Yuhan mentioned (from over 5 hours later), it shows:

>>> tod1.signal[10,:10]
array([ 2.697793  ,  0.        , -0.16145149, -0.65913236, -0.79795766,
       -0.71569794, -0.71167123, -0.73190063, -0.7272028 , -0.7230802 ],
      dtype=float32)

So we see the old value from before (2.7) quickly settling over to the new value (-0.7). This might be due to the readout filter "remembering" what the old values were. If so ... that should probably be fixed or somehow dodged.

mhasself commented 2 years ago

I still have my concern about timestamps, as I have found the timestamp on the dat file name does not agree with the G3 file name. However I have not looked into the timestamp more carefully to make sure it is not just a fine name issue. For example smurf dat file 1652039708, the right g3 file name is 1652039711_000.g3

Those are just the filenames. They only give the approximate start time. It is quite different from the timestamps in the frames.

kmharrington commented 2 years ago

I agree that if it's in the data file and we're loading what's in the file then that's definitely not something we'll change in sotodlib.

But I'll add more info here because Jack and I have spent awhile making sure we don't miss anything. Smurf sets the .dat file name based on the action ctime and the streamer sets the g3 file name based on the "session_id," which is the second the streamer is told to create a g3 file. These two times are basically always different because SMuRF is slow.

All these different times are archived in the .g3 files. The dump status frame has keys for the stream_id, session_id, smurf action, and action ctime. You can see in the G3tSmurf Observation building I go through and make sure every one of these fields match or are archived so they can be searched. https://github.com/simonsobs/sotodlib/blob/a3755c5bc1a904722d308b1d1e5b3118cc2086f6/sotodlib/io/load_smurf.py#L610

kmharrington commented 2 years ago

I'm going to close this since it's not something we're fixing here.

simonsobs / sotodlib

todlib loads more data than the smurf dat file #262