timeclock: better support for collecting data in multiple files ?

simonmichael commented 9 months ago

This came up in a mastodon/matrix chat:

nobodyinperson:

The merge conflicts I encountered with timeclock came up because my (emphasis on my) checkin/out commands appended to the same file. I would frequently checkin and out from different devices (e.g. checkout from phone in the elevator down, no reception there), but then I couldn't get the newest additions of the day due to lack of internet. Local checkout commit conflicts with all new commits of the day because all just append to the same file. Same with timewarrior. Randomizing locations in the file could reduce but not fully prevent conflicts.

The basic append workflow works if one has internet always everywhere - a weird requirement if one has a distributed offline system like git underneath.

IIRC, timeclock also can't track simultaneous things, same as timewarrior.

sm

with current hledger, it seems you can only have one open session per file. Multiple open sessions split across files works.

secondly, I see that you can't have a clockout in a file without having a preceding clockin in that same file

thirdly I think the clockout account is ignored, it will close any preceding clockin in that file (same as lack of multi-session support)

fourthly I see how updating the same file in two repos and then merging gives a git conflict (my brain was seeing it as an "obvious merge" that should work...)

I can do a merge manually by sorting both files: sort a.timeclock b.timeclock | sponge a.timeclock, in which case I don't get the commit history from b's repo

The multiple sessions limitation is noted as #2141. This issue covers two more problems with logging hledger timeclock data on multiple devices:

1. Merging changes from multiple places, with two sub-cases:

a. Not using version control, or only one file using version control

In this case you can merge manually by sorting all files' content into the home file [and committing that].

b. All files using version control, and a wish to preserve and merge the commit histories from all devices

In this case, pulling from the other repos creates ugly merge conflicts (with git at least; not tested with darcs or pijul).

2. Clockins/clockouts split across files

hledger won't read the multiple timeclock files unless you merge them first, because it currently doesn't allow a clockout in file B for a clockin in file A.

jamescooke commented 9 months ago

Could you help clarify the following?

hledger won't read the multiple timeclock files unless you merge them first

However, I've logged all my work in 2023 with one timeclock file per week (for example 2023_w01_20230102.timeclock). Those are unified in a journal file 2023.jourmal which has:

include 2023_*.timeclock

I can then run hledger -f 2023.journal bal just fine. Is this what you mean by "merge them first"?

simonmichael commented 9 months ago

Yup, I meant with "Clockins/clockouts split across files". I think you aren't clocking in in one file and clocking out in another. As long as you keep clockins/outs nicely paired in each file, including them works fine as you say.

simonmichael commented 9 months ago

By "merge them first" there I meant the manual merge process, like sort a.timeclock b.timeclock | sponge a.timeclock.

jamescooke commented 9 months ago

OK so a problem scenario would be something like starting a task at 23:00 on Sunday night and finishing it at 01:00 on Monday morning - have I got that right?

If we're being strict, then the "clock in" would be at the end of one week file and the "clock out" would be in the second.

Using the following files:

week_a.timeclock

i 2023/12/17 23:00 github

week_b.timeclock

o 2023/12/18 01:00

year.journal

include *.timeclock

Running:

hledger -f year.journal bal

Gives:

hledger: Error: /tmp/tmp.98GHIeZxop/week_b.timeclock:1:1:
1 | o 2023-12-18 01:00:00
  | ^

Expected a timeclock i entry but got o.
Only one session may be clocked in at a time.
Please alternate i and o, beginning with i.

Would this issue, if fixed, solve this problem?

simonmichael commented 9 months ago

Yes, making this more flexible (eg, ignoring "orphan" clockouts) would fix the "2. Clockins/clockouts split across files" problem.

jamescooke commented 9 months ago

Could you help clarify some of that planned flexible behaviour for multiple files? So, given that we have the following 3 timeclock files:

i 2023/12/17 21:00 reading
i 2023/12/17 23:00 github

o 2023/12/18 01:00

o 2023/12/18 03:00

And given there is a basic journal that includes all the timeclock files:

include *.timeclock

Then when we ask hledger for the balance with:

hledger -f year.journal bal

Will we receive:

               4.00h  github
               4.00h  reading
--------------------
               8.00h

Or

               2.00h  github
               6.00h  reading
--------------------
               8.00h

Thanks 🙏🏻

simonmichael commented 9 months ago

It needs some design, indeed. A bit of research:

https://hledger.org/dev/hledger.html#timeclock and doc source, tests, implementation - hledger's timeclock format. hledger currently requires a strict i, o, i, o alternation, so o always refers to the preceding i and you can only clock in to one account at a time.
https://ledger-cli.org/doc/ledger3.html#Time-Keeping - Ledger's timeclock format. This says "You can be checked-in to multiple accounts at a time, if you wish" but doesn't specify how o's are matched with i's in that case. I think if you name an account in the o record, it closes the most recent open i for that account, otherwise the most recent open i for any account (untested). [Correction, it seems: you can't have two open i's to the same account, the o account must match an open i account exactly, and you need one space after the time or it will get confused about the accounts. Also note Ledger requires dashed dates and requires seconds in the times.]
https://www.emacswiki.org/emacs/TimeClock - notes on the Emacs timeclock.el package. And also an extension timeclock-x.el, which says "With timeclock-x (as with timeclock) when you clock-in you will be asked for the name of a project, but then when you clock-out, in addition to being asked for a reason why, you’re also given the option of typing in a multi-line comment where you can say a few words about what you were doing."
https://github.com/emacs-mirror/emacs/blob/master/lisp/calendar/timeclock.el - the current timeclock.el and its git history, which dates from the present back to August 2000 or maybe even before (!). Timeclock.el predates Ledger, which began in 2003.
https://www.gnu.org/software/emacs/manual/html_node/emacs/Time-Intervals.html - the part of the Emacs manual that mentions timeclock.

simonmichael commented 9 months ago

So following the [imagined] Ledger design, I'd expect the second output - the first, generic, o closes the most recent i (github), and the second o closes the next most recent i (reading).

simonmichael commented 9 months ago

I did some testing and updated the notes above. Ledger, like hledger, does not allow an orphaned o in a file of its own. But if you combine the above examples into one file, I confirmed that Ledger calculates 6h for reading and 2h for github.

simonmichael commented 9 months ago

So a question to think about, when considering multiple input files, possibly with multiple formats: What time-related checks are performed on (a) individual timeclock files, (b) the combination of all timeclock files, and (c) the overall Journal from all files ?

It could be, eg:

a. Check that each i or o record has valid syntax b. Check that there are no orphan o's (all o's can be matched to a preceding i) c. The usual whole-journal checks, if enabled, like accounts, tags, maybe commodities

simonmichael commented 9 months ago

Though that b. (allowing orphan o's in individual files, but disallowing it in the combined timeclock data) feels both tricky to implement, and likely to bring the kind of data-order-dependent fragility we prefer to avoid. So if anything, I would be more inclined to just ignore orphaned o's always. And this would sacrifice some error checking. In exchange, it would help support the "messy timeclock files" use case, where i's and o's may appear in any of several files.

It wouldn't be sufficient though; it seems likely that "messy" timeclock files would also fairly easily break the date order requirement. I don't have an example, but I'm imagining hopping back and forth between devices, clocking in and out like a madman (perhaps by automated tool), and/or mixing up the order of input files when reporting. So this use case would probably require flexibility in date order, and that would need some design also.

It's a good time to think again, how important and desirable is this use case ? Note that @nobodyinperson is implementing his own solution based on git-annex.

nobodyinperson commented 9 months ago

If there is currently a requirement to have all i in ascending order, that should be dropped in my opinion. That an o has a later time than its previous i makes sense, but requiring a general ascending order also makes tracking simultaneous events harder.

simonmichael / hledger

timeclock: better support for collecting data in multiple files ? #2142