Open jbrzusto opened 6 years ago
For the case Jeremy gave, the receiver's bootnum apparently got reset (twice): This isn't supposed to happen, but here are a couple of ways it might:
This kind of issue is why there are two boot-related fields in each receiver DB's files
table:
The reason to even record bootnums was to allow positioning orphaned data: if a receiver's GPS was unable to set the system clock, an entire boot session of data would have invalid dates. With bootnum, at least the block of data could be positioned between other boot sessions post hoc. However, this might be a poor hook on which to hang data processing, especially when, as here, file dates are correct. (Incorrect dates are all pre 2010.)
Fixes:
or
I made a poor decision in structuring processing around boot sessions, as the benefits are only for a few exceptional cases, and the problems are widespread.
There is more going on here. How does the following sequence of files get generated? (This set of files is responsible for a pair of overlapping batches.)
$ ls -al /sgm/file_repo/SG-3615BBBK0DB6/2017-11-20
total 1380
drwx------ 1 sg sg 3996 Jan 3 12:39 .
drwxrwxr-x 1 sg sg 4940 Jun 19 10:58 ..
-rwxr-xr-x 1 sg sg 53 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T00-46-50.6890P-all.txt.gz
-rwxr-xr-x 1 sg sg 108 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T01-48-54.5960P-all.txt.gz
-rwxr-xr-x 1 sg sg 135 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T03-01-07.6880P-all.txt.gz
-rwxr-xr-x 1 sg sg 151 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T04-04-49.4040P-all.txt.gz
-rwxr-xr-x 1 sg sg 92 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T05-05-35.1690P-all.txt.gz
-rwxr-xr-x 1 sg sg 305 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T06-51-34.1750P-all.txt.gz
-rwxr-xr-x 1 sg sg 386 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T07-57-42.0700P-all.txt.gz
-rwxr-xr-x 1 sg sg 91 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T09-04-53.8860P-all.txt.gz
-rwxr-xr-x 1 sg sg 210 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T10-09-29.3100P-all.txt.gz
-rwxr-xr-x 1 sg sg 7755 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T11-15-08.2450P-all.txt.gz
-rwxr-xr-x 1 sg sg 807 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T12-29-15.9940P-all.txt.gz
-rwxr-xr-x 1 sg sg 1342 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T13-37-56.5410P-all.txt.gz
-rwxr-xr-x 1 sg sg 5592 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T14-38-12.2490P-all.txt.gz
-rwxr-xr-x 1 sg sg 169 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T15-45-51.0710P-all.txt.gz
-rwxr-xr-x 1 sg sg 405 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T16-49-44.9280P-all.txt.gz
-rwxr-xr-x 1 sg sg 75 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T17-52-40.7070P-all.txt.gz
-rwxr-xr-x 1 sg sg 93 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T19-16-19.3440P-all.txt.gz
-rwxr-xr-x 1 sg sg 76 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T20-31-28.6900P-all.txt.gz
-rwxr-xr-x 1 sg sg 169 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T21-42-04.0290P-all.txt.gz
-rwxr-xr-x 1 sg sg 249 Nov 21 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T22-44-48.4160P-all.txt.gz
-rwxr-xr-x 1 sg sg 88 Nov 21 2017 18_Sehestedt-3615BBBK0DB6-000049-2017-11-20T23-50-51.7530P-all.txt.gz
-rwxr-xr-x 1 sg sg 1215 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000050-2017-11-20T16-18-06.2570P-all.txt.gz
-rwxr-xr-x 1 sg sg 62735 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000050-2017-11-20T17-19-01.4220P-all.txt.gz
-rwxr-xr-x 1 sg sg 18988 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000050-2017-11-20T18-19-02.2120P-all.txt.gz
-rwxr-xr-x 1 sg sg 293172 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000050-2017-11-20T19-19-02.4900P-all.txt.gz
-rwxr-xr-x 1 sg sg 166745 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000050-2017-11-20T20-19-02.6660P-all.txt.gz
-rwxr-xr-x 1 sg sg 61934 Nov 20 2017 18_Sehestedt-3615BBBK0DB6-000050-2017-11-20T21-19-03.7420P-all.txt.gz
-rwxr-xr-x 1 sg sg 306645 Nov 21 2017 18_Sehestedt-3615BBBK0DB6-000050-2017-11-20T22-19-07.9220P-all.txt.gz
-rwxr-xr-x 1 sg sg 390971 Nov 21 2017 18_Sehestedt-3615BBBK0DB6-000050-2017-11-20T23-19-08.0010P-all.txt
$ zgrep G *
This machine apparently rebooted, but then set its clock incorrectly. This is a suspicious unit, as there are no GPS records in any of the files, even invalid ones. This sensorgnome is not running in a standard configuration! Perhaps someone has configured a custom clock source, but in that case, it has not been done in a way that works correctly. The pattern of rebooting then setting the clock back several hours is repeated on other days for this receiver:
$ ls /sgm/file_repo/SG-3615BBBK0DB6/2017-11-23
18_Sehestedt-3615BBBK0DB6-000051-2017-11-23T09-43-03.1060P-all.txt.gz 18_Sehestedt-3615BBBK0DB6-000052-2017-11-23T19-54-02.5950P-all.txt.gz
18_Sehestedt-3615BBBK0DB6-000051-2017-11-23T10-43-04.1850P-all.txt.gz 18_Sehestedt-3615BBBK0DB6-000052-2017-11-23T21-02-22.4060P-all.txt
18_Sehestedt-3615BBBK0DB6-000051-2017-11-23T11-43-14.6390P-all.txt.gz 18_Sehestedt-3615BBBK0DB6-000053-2017-11-23T16-18-49.2990P-all.txt.gz
18_Sehestedt-3615BBBK0DB6-000051-2017-11-23T12-44-10.2460P-all.txt.gz 18_Sehestedt-3615BBBK0DB6-000053-2017-11-23T17-18-58.2740P-all.txt.gz
18_Sehestedt-3615BBBK0DB6-000051-2017-11-23T13-49-38.7670P-all.txt.gz 18_Sehestedt-3615BBBK0DB6-000053-2017-11-23T18-19-06.0500P-all.txt.gz
18_Sehestedt-3615BBBK0DB6-000051-2017-11-23T14-54-34.6580P-all.txt.gz 18_Sehestedt-3615BBBK0DB6-000053-2017-11-23T19-19-45.2310P-all.txt.gz
18_Sehestedt-3615BBBK0DB6-000051-2017-11-23T15-54-36.9250P-all.txt 18_Sehestedt-3615BBBK0DB6-000053-2017-11-23T20-19-55.6920P-all.txt.gz
18_Sehestedt-3615BBBK0DB6-000052-2017-11-23T16-19-43.9870P-all.txt.gz 18_Sehestedt-3615BBBK0DB6-000053-2017-11-23T21-20-22.9560P-all.txt.gz
18_Sehestedt-3615BBBK0DB6-000052-2017-11-23T17-21-50.0530P-all.txt.gz 18_Sehestedt-3615BBBK0DB6-000053-2017-11-23T22-38-04.4120P-all.txt.gz
18_Sehestedt-3615BBBK0DB6-000052-2017-11-23T18-50-03.0600P-all.txt.gz 18_Sehestedt-3615BBBK0DB6-000053-2017-11-23T23-43-49.5760P-all.txt.gz
For another receiver, there don't seem to be any bootnum inversions:
sqlite3 /sgm/recv/SG-1614BBBK1612.motus
SQLite version 3.16.2 2017-01-06 16:32:41
Enter ".help" for usage hints.
sqlite> select * from files as t1 join files as t2 on t1.fileID < t2.fileID where t1.ts > 1e9 and t2.ts > 1e9 and t1.ts < t2.ts and t1.monoBN > t2.monoBN;
sqlite>
So for this one, why are there 9 pairs of overlapping batches with different monoBN?
MariaDB [motus]> select * from tempBatchOverlaps where serno='SG-1614BBBK1612' limit 2;
+----------+----------+---------------+---------+---------+------------------+------------------+------------------+------------------+----------+----------+-------------+-------------+-----------------+
| batchID1 | batchID2 | motusDeviceID | monoBN1 | monoBN2 | tsStart1 | tsEnd1 | tsStart2 | tsEnd2 | numHits1 | numHits2 | motusJobID1 | motusJobID2 | serno |
+----------+----------+---------------+---------+---------+------------------+------------------+------------------+------------------+----------+----------+-------------+-------------+-----------------+
| 7796 | 7797 | 13 | 45 | 46 | 62.8074688911438 | 688.160468935966 | 62.8300286531448 | 87844.2100286484 | 0 | 0 | 3674 | 3675 | SG-1614BBBK1612 |
| 7796 | 7798 | 13 | 45 | 47 | 62.8074688911438 | 688.160468935966 | 63.7730296850204 | 19348.3570296764 | 0 | 0 | 3674 | 3676 | SG-1614BBBK1612 |
Here the tsStart
and tsEnd
fields have been recorded as timestamps from CLOCK_MONOTONIC
instead of CLOCK_REALTIME
(a receiver issue that is corrected elsewhere in the processing). So the overlap is not a problem. Moreover, these batches have no hits.
So the problem is not as widespread as the analysis earlier showed. Focussing on pairs of overlapping batches where:
tsStart
and tsEnd
are recorded in CLOCK_REALTIME
we have a much smaller set of problematic batches:
MariaDB [motus]> select sum(numHits1), sum(numHits2), count(distinct motusDeviceID), count(*), count(distinct batchID1) from tempBatchOverlaps where (numHits1 > 0 or numHits2 > 0) and (tsStart1 > 1e8 and tsStart2 > 1e8) and not serno regexp ".*[0-9]{4}BB0.*" and not serno regexp "^L.*" ;
+---------------+---------------+-------------------------------+----------+--------------------------+
| sum(numHits1) | sum(numHits2) | count(distinct motusDeviceID) | count(*) | count(distinct batchID1) |
+---------------+---------------+-------------------------------+----------+--------------------------+
| 63280060 | 72332114 | 62 | 35731 | 601 |
+---------------+---------------+-------------------------------+----------+--------------------------+
I am unsure if this is the same issue or related, but this is a case where a batch re-run included GPS data from a very different period than the hits. The original GPS readings were not included.
Reified from @Hussell's comment on MotusDev/Motus-TO-DO#236:
This shouldn't happen but has:
The two jobs that generated these batches are as so:
How many pairs of such incorrectly overlapping batches are there?
And the number of affected receivers is also large:
So the worry is justified!