jbrzusto / motusServer

R package to operate a server that processes data for https://motus.org
GNU General Public License v2.0
1 stars 0 forks source link

time-interleaved files with same monoBN causes earlier raw data records to be ignored #435

Open jbrzusto opened 5 years ago

jbrzusto commented 5 years ago

reified from MotusDev/Motus-TO-DO#434 Somewhat like #320 and #407.
In this case, there are detections in files from the original boot session 3, but because this is a beaglebone white SG that was redeployed with a fresh SD card, and which had a bug whereby boot numbers did not increase, there are several distinct boot sessions 3. And unfortunately, there are files from later boot session 3 which have earlier pre-GPS timestamps than some such files from an earlier boot session 3.
These later files are read early and bump the tag finder's clock forward before any of the post-GPS timestamped files from the truly earlier boot session 3 can be processed. When the latter are seen, their records are ignored because they contain time reversals.

This whole situation needs a rethink, as further elaborated in the issues linked above.

jbrzusto commented 5 years ago

This problem requires a dive into the deep end of sensorgnome / motus design and implementation.

Here are notes that sketch out enough (hopefully) background to guide a solution.

Data Flow

Here are the different ways the tag finder can be called to process some files:

  1. old files: all files from a boot session are re-run in temporal sequence.

  2. new files in a new boot session: when new files arrive, they are grouped by boot session, and files in each are processed in a single run of the tag finder (i.e. one run per boot session)

  3. new files in an existing boot session: as an optimization, the tag finder always saves its internal state at the end of a run, so that new files for an existing boot session can be processed incrementally. This is how we avoid quadratic growth in processing time.

So a single run of the tag finder handles files from a single boot session (and not necessarily all of those files). This single run produces output called a batch, which consists of individual tag detections (hits) grouped into runs (which are on the same antenna).

The problem: boot sessions aren't monotonic

The decision to use boot sessions to organized data was made when almost all SG data were coming from beaglebone-black (BBBK) sensorgnomes, which have internal flash memory where we can store the boot count. This works, but:

So overall, the fact that N > M does not necessarily mean that a file (labelled as being) from boot session 'N' was really written later than a file from boot session 'M'

The consequences of non-monotonic boot sessions

The Catch

Possible ways forward

These aren't necessarily mutually exclusive.

leberrigan commented 5 years ago

Thanks for laying this out clearly. Do you have any further thoughts on moving forward? Should I assign this issue to somebody?

jbrzusto commented 5 years ago

Sorry, way behind on stuff. If someone else wants to take it on, great. It is a substantial chunk of work, so best to coordinate efforts on it to avoid duplication.

joeybernard commented 5 years ago

I should be diving into this soon. Just dealing with a few other items first.