Open meedstrom opened 3 years ago
Recent thought: the past-classification model could be implemented if you consider carving up the last 24-48 hours into small time blocks. These blocks are most naturally delineated by the times of buffer change (not buffer_kind
change, just buffer change!). So some blocks would last mere seconds, and a rare few could last hours. During such a block, all variables stay fixed -- they only change when a buffer change occurs. You can assume these variables are correct the entire time within a block.
Of course if you have a buffer under focus for 3 hours but are idle for 1h40 of that, then it was only under focus for 1h20. Perhaps we'll insert idle blocks into the dataset as a sort of pseudo-buffer to reflect the fact that the user isn't focusing on any buffer.
In my experience, oftentimes the user will land in a situation where they are rapidly switching between buffers that belong to different buffer_kind
categories. If we naively classify these without any regard to the fact of the rapid switching, we'll end up with Org clock lines looking like this:
CLOCK: [2017-12-11 Mon 23:35]--[2017-12-11 Mon 23:36] => 0:01
CLOCK: [2017-12-11 Mon 23:33]--[2017-12-11 Mon 23:34] => 0:01
CLOCK: [2017-12-11 Mon 23:30]--[2017-12-11 Mon 23:31] => 0:01
CLOCK: [2017-12-11 Mon 23:27]--[2017-12-11 Mon 23:28] => 0:01
CLOCK: [2017-12-11 Mon 23:26]--[2017-12-11 Mon 23:27] => 0:01
That is not nice, and clutters up the agenda log too. If we don't have a model component that will handle rapid switching elegantly, I'm inclined to just carve up the day into 30-minute blocks and be content with classifying each as the average of what happened within them. Such a model has its own complexities -- many variables are no longer fixed within time blocks, and we have to consider what's a good way of weighting the averages of each variable.
For background, see the README for all the theory.
Current questions on the stats theory
Re. the model for realtime guesses:
org-clock-out
when done (whereupon we take over), so it's not a major problem.Re. the model for classifying the last 24-48 hours all at once:
activity_verified
datapoints. Including that data makes it a non-random sample with regards to time, but I don't think we'll use it in a way that needs it to be random.activity_verified
data at all, since they're only about single instants "I'm doing X right now this nanosecond"?buffer_kind
changes.activity_verified
value attached to every buffer focus during these chunks.)General questions:
activity
andtime.since.bufkind.change
.~ (use an exponential decay)buffer_kind
andtime.since.bufkind.change
?buffer
and another constructed variabletime.since.buf.change
. Tentatively, I think not.activity_verified
's missingness processmissingness_verification
be modeled?