Open jbrzusto opened 5 years ago
This problem requires a dive into the deep end of sensorgnome / motus design and implementation.
Here are notes that sketch out enough (hopefully) background to guide a solution.
a sensorgnome (SG) writes pulse detection data to a sequence of files
an SG begins a new file every hour, or every megabyte of uncompressed data, whichever comes first; compressed and uncompressed files are written in tandem, with the uncompressed file deleted upon switching to a new file
filenames include the SG serial number, timestamp, and boot session count (the latter is supposed to increase by one each time the SG reboots, but this isn't always the case)
when users download files from an SG, they might get a partial copy of the last file (i.e. the file transfer process is not sync'd with file writing)
generally, batches of files from an SG reach the motus server in inreasing temporal order, but not always (sometimes, files are located later, as some SGs have more than one onboard storage location, which users are not always aware of; or apparently corrupt SD cards are later scanned for data)
pulses from data files must be run against a full database of active tags and their pulse patterns in order to assemble them into tag detections; a pulse is deemed to belong to at most one tag
the tag database exists only on the motus server
the interpretation of an individual pulse depends on context:
the tag finder (find_tags_motus
) uses a "greedy" approach to
extract tag detections from pulse data in a single pass. ("greedy" means
that the first confirmed tag detection sequence that is compatible
with a pulse gets to claim it).
it's not feasible to re-run the tag finder on the entire pulse dataset for an SG every time we receive new data from it; this is especially true for networked receivers, from which we sync data hourly: the cumulative time spent processing data from each receiver would grow quadratically over time if we reprocessed from the beginning with each new batch of files.
instead, we split the sequence of files from an SG into time periods, and when new data arrive from an SG, we only re-run those time periods for which there are new files.
the time periods we chose are "boot sessions" (i.e. the maximal period of time during which a receiver ran without a reboot).
Here are the different ways the tag finder can be called to process some files:
old files: all files from a boot session are re-run in temporal sequence.
new files in a new boot session: when new files arrive, they are grouped by boot session, and files in each are processed in a single run of the tag finder (i.e. one run per boot session)
new files in an existing boot session: as an optimization, the tag finder always saves its internal state at the end of a run, so that new files for an existing boot session can be processed incrementally. This is how we avoid quadratic growth in processing time.
So a single run of the tag finder handles files from a single boot session (and not necessarily all of those files). This single run produces output called a batch, which consists of individual tag detections (hits) grouped into runs (which are on the same antenna).
The decision to use boot sessions to organized data was made when almost all SG data were coming from beaglebone-black (BBBK) sensorgnomes, which have internal flash memory where we can store the boot count. This works, but:
beaglebone-white (BBW) sensorgnomes (the original model, of which there are still maybe a dozen gathering data) and raspberry-pi sensorgnomes (most new SGs in the past couple of years) do not have this internal persistent storage, and as users run through different SD cards in the same unit, boot counts get reset or mixed up between receivers
there was a bug in incrementing the boot count (I know; pathetic; how do you fail to
implement ++x
?) in at least one version of SG software, even on BBBK SGs.
some users appear to have customized their SG's software in ways that mess with the boot count
So overall, the fact that N > M
does not necessarily mean that a file (labelled as being) from
boot session 'N' was really written later than a file from boot session 'M'
the first few files recorded by an SG after it boots often have incorrect timestamps: the SG boots thinking it is the year 2000, but real SG timestamps only begin in 2010 or later. Eventually, the GPS sets the system clock, and a correct timestamp is written, so the tagfinder uses this to back-correct those pre-2010 timestamps.
so if the system boots at different times but with the same boot number, there will be multiple files labelled with pre-GPS timestamps and the same boot numbers. One of these files eventually has a valid timestamp, and the tag finder will use that to back-correct the preceding timestamps.
The Catch
calculate monotonic boot numbers for each receiver; there is some code in the motusServer R package that does this, but hasn't been integrated into normal file processing
re-organize file processing around some other marker. e.g. every two-week period
These aren't necessarily mutually exclusive.
Thanks for laying this out clearly. Do you have any further thoughts on moving forward? Should I assign this issue to somebody?
Sorry, way behind on stuff. If someone else wants to take it on, great. It is a substantial chunk of work, so best to coordinate efforts on it to avoid duplication.
I should be diving into this soon. Just dealing with a few other items first.
reified from MotusDev/Motus-TO-DO#434 Somewhat like #320 and #407.
In this case, there are detections in files from the original boot session 3, but because this is a beaglebone white SG that was redeployed with a fresh SD card, and which had a bug whereby boot numbers did not increase, there are several distinct boot sessions 3. And unfortunately, there are files from later boot session 3 which have earlier pre-GPS timestamps than some such files from an earlier boot session 3.
These later files are read early and bump the tag finder's clock forward before any of the post-GPS timestamped files from the truly earlier boot session 3 can be processed. When the latter are seen, their records are ignored because they contain time reversals.
This whole situation needs a rethink, as further elaborated in the issues linked above.