meedstrom / eva

Emacs-based Virtual Assistant
GNU General Public License v3.0
172 stars 6 forks source link

Guess activity #4

Open meedstrom opened 3 years ago

meedstrom commented 3 years ago

For background, see the README for all the theory.

Current questions on the stats theory

Re. the model for realtime guesses:

Re. the model for classifying the last 24-48 hours all at once:

General questions:

meedstrom commented 3 years ago

Recent thought: the past-classification model could be implemented if you consider carving up the last 24-48 hours into small time blocks. These blocks are most naturally delineated by the times of buffer change (not buffer_kind change, just buffer change!). So some blocks would last mere seconds, and a rare few could last hours. During such a block, all variables stay fixed -- they only change when a buffer change occurs. You can assume these variables are correct the entire time within a block.

Of course if you have a buffer under focus for 3 hours but are idle for 1h40 of that, then it was only under focus for 1h20. Perhaps we'll insert idle blocks into the dataset as a sort of pseudo-buffer to reflect the fact that the user isn't focusing on any buffer.

In my experience, oftentimes the user will land in a situation where they are rapidly switching between buffers that belong to different buffer_kind categories. If we naively classify these without any regard to the fact of the rapid switching, we'll end up with Org clock lines looking like this:

CLOCK: [2017-12-11 Mon 23:35]--[2017-12-11 Mon 23:36] =>  0:01
CLOCK: [2017-12-11 Mon 23:33]--[2017-12-11 Mon 23:34] =>  0:01
CLOCK: [2017-12-11 Mon 23:30]--[2017-12-11 Mon 23:31] =>  0:01
CLOCK: [2017-12-11 Mon 23:27]--[2017-12-11 Mon 23:28] =>  0:01
CLOCK: [2017-12-11 Mon 23:26]--[2017-12-11 Mon 23:27] =>  0:01

That is not nice, and clutters up the agenda log too. If we don't have a model component that will handle rapid switching elegantly, I'm inclined to just carve up the day into 30-minute blocks and be content with classifying each as the average of what happened within them. Such a model has its own complexities -- many variables are no longer fixed within time blocks, and we have to consider what's a good way of weighting the averages of each variable.