Closed jbrzusto closed 7 years ago
branch name for this fix is batchRuns_table
The new schema will require changes to:
DBFiler::end_run:
counts of mention of batchIDbegin or batchIDend
file | count |
---|---|
dataServer.R | 37 |
deleteFromMotus.R | 1 |
dropBatchesFromTransfer.R | 3 |
ensureTagProjDB.R | 2 |
makeTagProjDB.R | 1 |
purgeFromMotus.R | 1 |
pushToMotus.R | 11 |
sgEnsureDBTables.R | 5 |
counts of mention of batchIDbegin or batchIDend
file | count |
---|---|
ensureDBTables.R | 2 |
srvRunsForReceiverProject.R | 2 |
srvRunsForTagProject.R | 2 |
to minimize code and DB churn, don't change the receiver DBs at this point. Just add entries to
the batchRuns table in the master DB in motusServer::pushToMotus
, and make appropriate
improvements to the queries in motusServer::dataServer
. Also, drop runUpdates in favour
of simply updating the runs table directly.
The DB receiver schema is faulty: if data batches are not processed in temporal order, which can easily happen when users send data they'd missed, the test for a run touched by batch B
batchIDbegin = B
or batchIDend = B
or (batchIDend is null and batchIDbegin < B)
is wrong: if R
is an unfinished run with R.batchIDbegin = B2
and B2.batchID < B.batchID and B2.tsStart > B.tsEnd
, then R satifies the third clause even though it is in B
's future.
So we're back to needing batchRuns in the recv DB. And why do we need batchIDbegin, batchIDend
in that case? tsBegin
and tsEnd
would be more useful, along with a boolean done
the new_server branch, which will become master, implements this.
Suppose we have batches from multiple receivers, and a run on one receiver that spans three of its batches. Under the current schema, these might appear like so in final form:
After only batches b1 and b2 have been received, the client-side DB looks like this:
If the next update occurs after batch b3 has been processed, here's what we'd want the client-side to look like:
However, the only way to determine that the record for run r1 needs to be updated is:
i.e. we must examine each hit to find the unique set of runs involved in batch 3.
A better schema would include a new table
batchRuns
:which uses a new table batchRuns, with indexes on runID and batchID, and a many-to-many relation between these two columns, to track which runs overlap which batches. The hits from run R within batch B could then be queried by:
All hits from batch B could be obtained as