simpeg / aurora

software for processing natural source electromagnetic data
MIT License
13 stars 2 forks source link

Timing Errors (workaround for widescale tests) #289

Open kkappler opened 10 months ago

kkappler commented 10 months ago

We have never carefully studied timing errors and the structure of MTH5 makes severe timing errors extremely unlikely, but it is possible to have some minor errors. These should eventually be addressed in MTH5, with a generic solution.

For now these are impeding the widescale test processing, and a workaround will be put into aurora.

This issue was encountered when processing Station ORF10 with remote reference ORG10.

Recall that the KernelDataset has a dataframe which lists runs that need to be processed. When there is a remote reference station, the rows of the dataframe are organized such that rows 0 and 1 correspond to a simultaneous data at Local and Reference stations respectively. The same is true of rows 2 & 3, 4&5, and so on.

What goes wrong is that when loading the time series from mth5, the "paired" time series can have differing length. Even if the TS are off by a single sample (which is the only problem I have seen so far), this can mean that the STFT spectrograms can have one more or one fewer spectral estimate.

For example, here is a run-pair: A time series corresponding to ORF10, run=006 is loaded with starttime 2006-09-07 22:50:22, and endtime 2006-09-18 14:59:13.875000. This TS has 7377056 samples A time series corresponding to ORG10, run=002 is loaded with identical start and end times, but the TS has 7377055 samples. We could call this a timing error -- technically it is, but seriously we are looking at +/- one sample of, in this case 8Hz data, over the course of 11 days, so it is trivial.

TS ORF10 006 (7377056,) 2006-09-07 22:50:22+00:00 2006-09-18 14:59:13.875000+00:00 TS ORG10 002 (7377055,) 2006-09-07 22:50:22+00:00 2006-09-18 14:59:13.875000+00:00

However, as it turns out, with default windows of length 128, and 96-sample advance (32-sample overlap) the ORF10 TS has exactly enough samples for 76844 spectral estimates, but the ORG10 TS is one sample shy, so it can only come up with 76843 spectral estimates. What goes wrong downstream is in effective_degrees_of_freedom_weights but what went wrong was that we started with non-uniformly sampled data.

The proposed workaround is to drop FCs that do not match one another. There are various ways to go about this. A fairly straightforward solution would be to check the timestamps from local and RR are in agreement for each chunk right before the spectrograms are merged across all runs.

The request dataframes are attached ORF09_request_dataframe.csv ORG10_request_dataframe.csv tfk_dataset.csv tfk_dataset_time_periods.csv

kkappler commented 10 months ago

N.B. My previous commit message incorrectly referenced 289, should have been 293, sigh ...