nomeata / arbtt

arbtt, the automatic rule-based time-tracker
http://arbtt.nomeata.de/
GNU General Public License v2.0
312 stars 36 forks source link

Lots of unmatched time when using a simple rule that should tag everything #129

Open nomeata opened 3 years ago

nomeata commented 3 years ago

Original report by Konzertheld (Bitbucket: Konzertheld, GitHub: Konzertheld).


Hello there,

I have a simple rule in my configuration: tag Program:$current.program that should tag every entry. When I run statistics: arbtt-stats -c Program --for-each Month -m 0.1 I get a lot of unmatched time though. What could be happening?

I use arbtt-stats version 0.10.4 on Manjaro Linux. Following is the entire categorizing file:

-- -*- mode: haskell; -*-
-- Comments in this file use the Haskell syntax:
-- A "--" comments the rest of the line.
-- A set of {- ... -} comments out a group of lines.

-- This defines some aliases, to make the reports look nicer:
aliases (
    "sun-awt-X11-XFramePeer"  -> "java",
    "sun-awt-X11-XDialogPeer" -> "java",
    "sun-awt-X11-XWindowPeer" -> "java",
    "gramps.py"               -> "gramps",
    "___nforschung"           -> "ahnenforschung",
    "Pidgin"                  -> "pidgin"
    )

-- A rule that probably everybody wants. Being inactive for over a minute
-- causes this sample to be ignored by default.
$idle > 60 ==> tag inactive,

-- A rule that matches on a list of strings
current window $program == ["Navigator","galeon"] ==> tag Web,

current window $program == "sun-awt-X11-XFramePeer" &&
current window $title == "I3P"
  ==> tag Program:I3P,

current window $program == "sun-awt-X11-XDialogPeer" &&
current window $title == " " &&
any window $title == "I3P"
  ==> tag Program:I3P,

-- Simple rule that just tags the current program
tag Program:$current.program,

-- Another simple rule, just tags the current desktop (a.k.a. workspace)
tag Desktop:$desktop,

tag Title:$current.title,

-- I'd like to know what evolution folders I'm working in. But when sending a
-- mail, the window title only contains the (not very helpful) subject. So I do
-- not tag necessarily by the active window title, but the title that contains
-- the folder
current window $program == "evolution" &&
any window ($program == "evolution" && $title =~ /^(.*) \([0-9]+/)
  ==> tag Evo-Folder:$1,

-- A general rule that works well with gvim and gnome-terminal and tells me
-- what project I'm currently working on
current window $title =~ m!(?:~|home/jojo)/projekte/(?:programming/(?:haskell/)?)?([^/)]*)!
  ==> tag Project:$1,
current window $title =~ m!(?:~|home/jojo)/debian!
  ==> tag Project:Debian,

-- This was a frequently looked-at pdf-File
current window $title =~ m!output.pdf! &&
any window ($title =~ /nforschung/)
  ==> tag Project:ahnenforschung,

-- My diploma thesis is in a different directory
current window $title =~ [ m!(?:~|home/jojo)/dokumente/Uni/DA!
                         , m!Diplomarbeit.pdf!
                         , m!LoopSubgroupPaper.pdf! ]
  ==> tag Project:DA,

current window $title =~ m!TDM!
  ==> tag Project:TDM,

( $date >= 2010-08-01 &&
  $date <= 2010-12-01 &&
  ( current window $program == "sun-awt-X11-XFramePeer" &&
      current window $title == "I3P" ||
    current window $program == "sun-awt-X11-XDialogPeer" &&
      current window $title == " " &&
      any window $title == "I3P" ||
    current window $title =~ m!(?:~|home/jojo)/dokumente/Uni/SA! ||
    current window $title =~ m!Isabelle200! ||
    current window $title =~ m!isar-ref.pdf! ||
    current window $title =~ m!document.pdf! ||
    current window $title =~ m!outline.pdf! ||
    current window $title =~ m!Studienarbeit.pdf! )
) ==> tag Project:SA,

-- Out of curiosity: what percentage of my time am I actually coding Haskell?
current window ($program == "gvim" && $title =~ /^[^ ]+\.hs \(/ )
  ==> tag Editing-Haskell,

{-
-- Example of time-related rules. I do not use these myself.

-- To be able to match on the time of day, I introduce tags for that as well.
-- $time evaluates to local time.
$time >=  2:00 && $time <  8:00 ==> tag time-of-day:night,
$time >=  8:00 && $time < 12:00 ==> tag time-of-day:morning,
$time >= 12:00 && $time < 14:00 ==> tag time-of-day:lunchtime,
$time >= 14:00 && $time < 18:00 ==> tag time-of-day:afternoon,
$time >= 18:00 && $time < 22:00 ==> tag time-of-day:evening,
$time >= 22:00 || $time <  2:00 ==> tag time-of-day:late-evening,

-- This tag always refers to the last 24h
$sampleage <= 24:00 ==> tag last-day,

-- To categorize by calendar periods (months, weeks, or arbitrary periods),
-- I use $date variable, and some auxiliary functions. All these functions
-- evaluate dates in local time. Set TZ environment variable if you need
-- statistics in a different time zone.

-- You can compare dates:
$date >= 2001-01-01 ==> tag this_century,
-- You have to write them in YYYY-MM-DD format, else they will not be recognized.

-- “format $date” produces a string with the date in ISO 8601 format
-- (YYYY-MM-DD), it may be compared with strings. For example, to match
-- everything on and after a particular date I can use
format $date =~ ".*-03-19"  ==> tag period:on_a_special_day,
-- but note that this is a rather expensive operation and will slow down your
-- data processing considerably.

-- “day of month $date” gives the day of month (1..31),
-- “day of week $date” gives a sequence number of the day of week
-- (1..7, Monday is 1):
(day of month $date == 13) && (day of week $date == 5) ==> tag day:friday_13,

year $date == 2010 ==> tag year:2010,

-- “$now” evaluates to the current time
day of month $now == day of month $date ==> tag current-day,
month $now == month $date ==> tag current-month,
year $now == year $date ==> tag current-year,
-}
-- “month $date” gives a month number (1..12), “year $date” gives a year:
month $date == 1 ==> tag month:Januar,
month $date == 2 ==> tag month:Februar,
month $date == 3 ==> tag month:März,
month $date == 4 ==> tag month:April,
month $date == 5 ==> tag month:Mai,
month $date == 6 ==> tag month:Juni,
month $date == 7 ==> tag month:Juli,
month $date == 8 ==> tag month:August,
month $date == 9 ==> tag month:September,
month $date == 10 ==> tag month:Oktober,
month $date == 11 ==> tag month:November,
month $date == 12 ==> tag month:Dezember,

(day of month $date == 10) && (month $date == 11) ==> tag DoM:7th,
(day of month $date == 14) && (month $date == 11) ==> tag DoM:14th,
(day of month $date == 21) && (month $date == 11) ==> tag DoM:21th,
(day of month $date == 28) && (month $date == 11) ==> tag DoM:28th,

--------------- ADDED 2020 -----------------
--------------------------------------------

current window $title =~ m!de\.wikipedia\.org! ==> tag Service:Wikipedia,

Thanks for reading and hopefully someone has an idea. Or maybe I found some bug, who knows. :slight_smile:

nomeata commented 3 years ago

Original comment by nomeata (Bitbucket: nomeata, GitHub: nomeata).


Odd indeed on first glance. Maybe you can minimize the configuration and check if it also occurs if you have just that rule?

Also, arbtt-stats --dump-samples is very useful, as it shows the raw data next to the tags.

nomeata commented 3 years ago

Original comment by Konzertheld (Bitbucket: Konzertheld, GitHub: Konzertheld).


Thanks for your answer. I tried a configuration file that contains nothing but the rule mentioned in the first sentence and used the same command mentioned above again, nothing changed. So, it’s not related to anything else in my config.

Then I learned I should not use --dump-samples without less. :laughing:

I then ran arbtt-stats --dump-samples --categorizefile=/home/christian/tmp/testcategorizearbtt | grep "^\W+Program:$" -B 8 | grep "(*)". A number of recurring windows appeared (sample):

() : franklin.hrz.tu-chemnitz.de hat ein Fenster freigegeben.
() : franklin.hrz.tu-chemnitz.de hat ein Fenster freigegeben.
() : franklin.hrz.tu-chemnitz.de hat ein Fenster freigegeben.
() : glxgears
() : glxgears
() : glxgears
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() : VirtualBox - Error In suplibOsInit
() processing-app-Base: 11-Arduino-I2C-Gangschalthebel-live | Arduino 1.8.13
() : fitzgerald.hrz.tu-chemnitz.de hat ein Fenster freigegeben.
() google-chrome: BigBlueButton - NaGruSe-Christian - https://fitzgerald.hrz.tu-chemnitz.de/ - Google Chrome
() : fitzgerald.hrz.tu-chemnitz.de hat ein Fenster freigegeben.
() processing-app-Base: 11-Arduino-I2C-Gangschalthebel-live | Arduino 1.8.13
() : fitzgerald.hrz.tu-chemnitz.de hat ein Fenster freigegeben.
() processing-app-Base: 11-Arduino-I2C-Gangschalthebel-live | Arduino 1.8.13
(*) : fitzgerald.hrz.tu-chemnitz.de hat ein Fenster freigegeben.

I find it interesting that some do have a program associated and some do not. The first lines are from BigBlueButton, a video conference system, when I shared my screen (propably). I had hope that would explain a lot of the time as I worked as a lecturer and spent quite some time in that system. So I updated my test config to match most of the lines from my search. It reduced the amount of unmatched time by amazing 24 seconds, leaving 4 days and almost 10 hours. So, I still have no clue. :slight_smile:

The lines that my grep search turned out are not too many, so I also have doubt those lines explain the problem at all.

Edit: I missed the inactive tag clause in my test config. I added it. It only accounts for 2 hours of those 4+ days. (That way, I also confirmed that filtering out inactive time works.)

nomeata commented 3 years ago

Original comment by Konzertheld (Bitbucket: Konzertheld, GitHub: Konzertheld).


Additional info from things I came up with today. An even simpler rule tag bla:asdf that tags everything with something static works. It matches 100% of the entries. current window $program == "" ==> tag lol:lol, matched nothing.

nomeata commented 3 years ago

Original comment by nomeata (Bitbucket: nomeata, GitHub: nomeata).


So the problem is that current window $program is empty for many windows?
And then additionally it is not possible to match on current window $program?
I also don’t really have a clue :confused:
Could you maybe do a fresh recording (so that we have only a few samples to look at) and send the the capture.log and categorize.cfg ? Then I can maybe debug this.

nomeata commented 3 years ago

Thanks for the test data! When I look at

arbtt-dump -f capture.log

I see that quite a few samples don't have an active window (the (*) in the first column). But some do. This would explain why current window-based rules fail, when there is no current window.

So the question is why that happens. Maybe there are modal dialogs open that are not recognized by arbtt? Or maybe one of the used applications is somehow confusing arbtt?

Konzertheld commented 2 years ago

New approach: Are there known issues with Wayland? I saw that there is a wayland compatible port of arbtt-capture. My system uses Wayland. (Asking because today I again have 6h 50min uptime but only 6h 1min captured including inactive times)

nomeata commented 2 years ago

Or maybe arbtt-capture's wake up interval isn't precise enough? If arbtt-capture takes 1s, and you set the interval to 10s, if I did the sleep naively, it might explain 10% lost time.

What's your interval time?

nomeata commented 2 years ago

(I don't use Wayland yet, and have no idea how arbtt will behave there)

Konzertheld commented 2 years ago

Hmm, my interval is 3 seconds. I can change it to 10 and see if it makes a difference.

nomeata commented 2 years ago

Ah, that might be an explanation. arbtt-capture is a bit naive in it’s interval calculation, and will always sleep the configured 3s between taking captures; not taking the time it takes to take the capture into account. At the default capture interval of one minute this might be ok, but with very short capture times, the error accumulates.

It shouldn’t be too hard to fix that, though.

Konzertheld commented 2 years ago

Ah well, if a fix is possible, that's even better. I use short intervals because I got quite a few captures of windows I accidentally alt-tabbed to which were then logged as whatever interval I had set instead of the 200 ms I actually focused them. With short intervals, those captures will disappear in irrelevance.

nomeata commented 2 years ago

My theory is that even with 60s interval, this will even out (you’ll catch them less often), but yes, depending on your needs and use cases this might be too imprecise. At the cost of a greatly increased log file, of course.

Konzertheld commented 2 years ago

Hmm, I'm trying to come up with a mathematical explanation but from my first thoughts, I think you're right.

Konzertheld commented 2 years ago

Update: It seems to decrease with increased intervals. I had 12 minutes leakage on 6 hours today with an interval of 10 seconds.

nomeata commented 2 years ago

145 might improve the situation. Do you want to try it out? Can you build from a branch or need help with that?

Konzertheld commented 2 years ago

145 might improve the situation. Do you want to try it out? Can you build from a branch or need help with that?

I can build from a branch. I thought I already did but I built master. Interestingly, it is extra bad today, almost a third of the time is lost. But I will try the branch now for the next few hours.

Konzertheld commented 2 years ago

Is this command correct? I get a weird output trying to check if the manually compiled version from the branch works at all.

dist/build/arbtt-stats/arbtt-stats -c Project --filter '$sampleage <= 00:05' --also-inactive                                                          ✔  33s   ghc-9.0.1  
Processing data [=======================================================================================================================================================================================================================] 101%
Processing data [=======================================================================================================================================================================================================================] 101%
Statistics for category "Project"
=================================
______________Tag_|_________Time_|_Percentage_
Project:MailsNews | 43d06h48m50s |      68.10
    Project:arbtt | 20d06h33m10s |      31.90
     (total time) | 63d13h22m00s |     100.00

Not only did it ignore the filter, the results are also completely wrong.

nomeata commented 2 years ago

This might be an old file, look at the time-stamp. Modern cabal puts its files in dist-newstyle.

Konzertheld commented 2 years ago

Nope, the directory did not exist before and the timestamp matches what I expected it to be.

Konzertheld commented 2 years ago

It was probably a bad idea that I just continued to use the old log file with no migration, right?

Konzertheld commented 2 years ago

Well, this is interesting. I tried a new logfile. It is already 27 MB large after not even five minutes. Processing takes way too long. While below, there are supposedly 4 days recorded, there are now 31 days recorded already. So either I messed up the build completely or something does not work at all.

ps aux | grep arbtt
[...] /AUR/arbtt-git/dist/build/arbtt-capture/arbtt-capture --sample-rate=10 --logfile=...experimental-logfile

dist/build/arbtt-stats/arbtt-stats -c Project --filter '$sampleage <= 00:05' --also-inactive --logfile=...experimental-logfile
Processing data [=======================================================================================================================================================================================================================] 101%
Processing data [=======================================================================================================================================================================================================================] 102%
Processing data [=======================================================================================================================================================================================================================] 103%
Processing data [=======================================================================================================================================================================================================================] 104%
Processing data [=======================================================================================================================================================================================================================] 105%
Statistics for category "Project"
=================================
__________Tag_|_________Time_|_Percentage_
Project:arbtt |  4d23h10m00s |     100.00
 (total time) |  4d23h10m00s |     100.00

Build commands were taken from the AUR package, I just removed those that write to system directories:

runhaskell Setup configure -O --enable-shared --enable-executable-dynamic --disable-library-vanilla \
        --dynlibdir=/usr/lib --libsubdir=\$compiler/site-local/\$pkgid

runhaskell Setup build
nomeata commented 2 years ago

Hmm, maybe my changes on the branch are just broken. :-/. Will have to careful look at them again.

nomeata commented 2 years ago

Did you run multiple arbtt-capture instances at the same time maybe?

Konzertheld commented 2 years ago

Did you run multiple arbtt-capture instances at the same time maybe?

no :)