simonmichael / hledger

Robust, fast, intuitive plain text accounting tool with CLI, TUI and web interfaces.
https://hledger.org
GNU General Public License v3.0
2.93k stars 316 forks source link

wish: hledger should support transactions with timestamps, following ISO8601 #117

Open tphyahoo opened 11 years ago

tphyahoo commented 11 years ago

Currently hledger transactions look like

2009/1/1 assets:cow_account COW1 ; one cow at 100USD equity:cow_account -COW1

The proposal is to allow transactions that have a precise timestamp such as

2007-04-05T14:30Z assets:cow_account COW1 ; one cow at 100USD equity:cow_account -COW1

, where the Z indicates UTC time zone.

If no time zone is specified, the time zone could be inferred using rules in a config file, or directive.

simonmichael commented 11 years ago

It seems a nice option, though possibly at the cost of making it harder to provide clear parse errors. Would ledger also parse it ?

For the moment, you could also store it as a tag (metadata), like

2007-04-05                     ; time:14:30Z
  assets:cow_account   1 COW
  equity:cow_account
xiaoruoruo commented 11 years ago

Nice option. Maybe make a option to output ledger-compatible file by only outputting the date part?

simonmichael commented 8 years ago

Do we have some use cases ?

Needs discussion/developers, closing for now.

adius commented 7 years ago

Modern payment providers like PayPal, Bitcoin, etc. provide millisecond precision for their transactions. It should be possible to reflect that in a journal. This is an aboslute "must have" feature. I was really surprised that it's not already integrated!

simonmichael commented 7 years ago

Hi, why is it "must have" ?

When someone truly needs it, I expect we'll have it. For now, you can model this with a transaction tag.

ony commented 7 years ago

I miss that feature only for merging multiple journals. Example: I have some balance assertions/assignments from bank account in one journal and automatic export of expenses on my shop account in other. Without precise time in transactions they are not intermixed for a single day. Work around: keep only one journal and periodically merge in transactions from generated ones.

If someone will start implementing it, I would like that person to consider treatment of transactions that are within the same day but with mix of those that have time and that don't. I can imagine that sequence of transactions without time specification will be assigned to a time range that spans between closest pair of transactions with time specification but don't cross day boundary. During merge of transactions from a different sources that belong to overlapping ranges we can follow current approach (order of read) effectively preserving current behaviour.

Imagine file 1, A, 4, 3, B, 7, 6, C, 8. We have ranges like: A:(1;4), B:(3;7), C:(6;8). I.e. there will be multiple ways to order transactions. Though we still can come up with some decision like to prefer placing transactions with time specification earlier while satisfying time range for those that have no time specification. I.e. 1, 3, A, 4, 6, B, 7, C, 8.

ellis commented 6 years ago

@simonmichael I would love this for bookkeeping as an Amazon seller because Amazon pays out settlements over time periods that begin and end sometime in the middle of the day. We want to validate the settlement reports, and ISO8601 timestamps would certainly be the easiest way to handle such cases. Adding a time tag seems like quite a hack, but if I use that, can you say what the query format should be for select the entries between two timestamps?

simonmichael commented 6 years ago

This sounds like a good use case, but can you explain it in more detail ? Why must you have a timestamp ?

ellis commented 6 years ago

Here's some context: One recent settlement covered transactions in Germany between 2017-11-08 21:22:04 UTC and 2017-11-22 21:23:20 UTC. Other countries have different settlement periods, but the periods are usually 14 days in duration.

I need to match all transactions to their invoices, and all bank transfers to the 14-day settlements. We sometimes find discrepancies in the settlement values, so we need to validate them all. Right now, I'm running these calculations in R (https://www.r-project.org/). However, it would be much nicer to just query the ledger for the transactions between two time-stamps; then we could compare the ledger subtotals with reality. Does that make sense?

0ihsan commented 3 years ago

Any update on this?

simonmichael commented 3 years ago

@ellis: yes it makes sense, thanks. Here's a mockup to explore how it could be hacked with current features:

; #117 how to match between 2017-11-08 21:22:04 UTC and 2017-11-22 21:23:20 UTC ?

2017-11-08 ; time:21:22:00
  (a)        1

2017-11-08 ; time:21:22:04
  (a)        2

2017-11-22 ; time:21:23:20
  (a)        3

2017-11-22 ; time:21:23:21
  (a)        4
$ hledger print date:2017-11-08..2017-11-23
2017-11-08  ; time:21:22:00
    (a)               1

2017-11-08  ; time:21:22:04
    (a)               2

2017-11-22  ; time:21:23:20
    (a)               3

2017-11-22  ; time:21:23:21
    (a)               4

# and filter out the outside-time-bounds transactions by hand

...is what I've come up with so far. I suspect you could accomplish the precise match through clever tag:time=REGEXP queries, but it looks too hard even for a PTA fanatic. It would help a lot if tag: could match on values greater or less than something, as amt: queries can. Eg this could work, assuming well formed data:

$ hledger print date:2017-11-08..2017-11-23 tag:'time>=21:22:04' tag:'time<=21:23:20'
alerque commented 3 years ago

I've recently run into some trouble de-duplicating transaction lists because the only uniquely identifying information would have been timestamps with more granular precision that "day". I've also run into considerable confusion generating sensible reports on cryptocurrency transactions where the rate changes and transaction flow only really makes sense on a much tighter timeline than "day".

simonmichael commented 3 years ago

@alerque where do you hit trouble with current hledger exactly.. is it in filtering by time, as in the example above ? Would the proposed quick solution (tag value comparison queries) solve it ?

adrp commented 1 year ago

I'd like to ask if there's been any progress on the ISO8601 time support issue.

This feature is crucial for my workflow. Just as an example, I really like how Tackler handles ISO8601. One key reason for my request is that I often need to generate reports in multiple time zones. For instance, I might need a report in PDT and another in the CEST time zone (most, but not all transactions will overlap)

Also, hledger serves more than just an accounting purpose for me, I need to preserve some additional data for each transaction. I believe I'm not alone, as we do have tags at the end of the day (and they're not strictly required in accounting). This is why I think that saying we care only for day granularity in accounting so the ISO8601 support is not essential misses the bigger picture.

Not a Haskell developer, but if there's any other way I can contribute or assist with this feature, I'm more than willing to help.

simonmichael commented 11 months ago

@adrp no, things are still as they were. You could help by taking up my question above - is a tag-based solution like the one suggested useful ? What real world problems can't it handle ? Another possibility is to try and stir up support for this and attract developers, by discussion or bounties or directly funding the work (design, impact analysis and mitigation, documentation, and implementation).

alerque commented 11 months ago

Can I suggest maybe this issue get re-opened? I have it flagged because I'm still quite interested in tracking it, but having it be closed limits discover ability from new parties.


Back to my use case: one of the most important ones for me is being able to merge ledger data from multiple files and projects and keep balance assertions intact. Right now I have to be very careful to only add assertions on days with no transactions for any accounts that I might have recorded in more than one ledger. This is an ongoing hassle. Additionally duplication is un-necessarily tricky. Tags could help mitigate the de-duplication issues some, but don't help much with balance assertions. Also using them involves a huge amount of extra post-processing tooling and queries.

simonmichael commented 11 months ago

Good idea, I have reopened it.

@alerque that sounds like a good use case, even if I can't quite picture it arising in my own usage. (I don't think I'd have transactions affecting the same bank account in different files - is it something more specialised you are doing ?)

Stealthmate commented 2 weeks ago

At the risk of expanding this issue a bit too much, I'd like to push for this feature from a slightly different and more abstract perspective.

Please correct me if I'm wrong, but the way I see it right now is that there is no way to specify a total ordering independent of parse order. The biggest issue with this is that people (myself included) who keep journals in multiple files, need to be very careful about file inclusion order, which transaction to put in which file, how to deal with balance assertions, etc. The next biggest issue is that concepts like "transaction that happens at the very end of the day/month/year" are hard to define clearly (again, need to be careful with parse order).

I'd argue that ISO8601 timestamps are the easiest way of solving this. Aside from the fact that more precise time info is generally useful/required for people like the guys above, timestamps give you a granular enough precision to "hack" a transaction order without invalidating the meaning of the timestamp itself (for example, if I only care about precision up to the hour, I can set arbitrary minutes/seconds/milliseconds to forcefully order transactions that happened within the same hour). Unless people start running international banks with multiple transactions per millisecond on hledger, timestamps give you an easy and decent compromise, and I believe it's possible to implement them in a way that doesn't break existing hledger journals as well.

Stealthmate commented 2 weeks ago

I discussed this with Simon on Matrix, and I've started hacking away at the code in hopes of making some progress on this issue

simonmichael commented 2 weeks ago

Cc'ing a few notes from there:

ony’s comment explores some of the complexities when merging mixed timed/untimed entries

I imagine a long period of testing and refinement before [each] merge to master - we don’t want to destabilise or complicate hledger for the majority not using this

with a working prototype I expect I for one will get a better detailed understanding of how this can be used

(and I agree it would be nice to have on principle , if we can afford it)

Q: are people who write a time going to be required to always write the timezone as well ? If not, how/when will that be detected ? Eg will a TZ directive be needed ? (analogous to Y directive)

A: Allow users to specify a TZ globally (via directive). If not specified, this directive defaults to UTC. If TZ is specified in the timestamp (e.g. 2024-01-01T00:00:00+02:00), use that. If TZ is not specified in the timestamp, use the global TZ (UTC by default, unless specified otherwise).

tphyahoo commented 2 weeks ago

I am the submitter of this issue 10 years ago.

I now have doubts if it was a good idea, based on experiences I have had since then.

What I would now recommend is posix timestamps

$ date +%s            
1725297599
$ date -ur 1725297583  
Mon Sep  2 17:19:43 UTC 2024

eg, keep 1725297599 in the hledger file, and have a viewer or a parser that translates it into whatever timezone / time format is desired.

It's just that whenever human is putting in the formatted date, bad things happen. There is no timezone or the wrong timezone, something is mis-spelled. It's all just too delicate.

So for web app (or whatever assisted app) input, there should be the usual date picker that figures out the right posix time and writes that.

Or a realistic workflow if working directly with text files, and you don't care about the time only the date, pick noon for your timezone, get the posix time for that and use it for all the transactions for that date. Then you sanity checking by running it through awk or whatever to view it with human readable date/timestamps, but that's "view data" so to speak and doesn't need to be saved. The canonical data that actually gets saved should be a posix timestamp.

This is a "forever solution" for timezones and formatting, which is as easy to implement as possible. It comes at some usability cost, eg versus just putting 20240902 for the date. But it seems like for some workflows timestamps and timezones actually matter, and maybe even have legal importance, and bad things happen if you get them wrong. So having an operator get used to always using a posix timestamp rather than a YYYYMMDD code is a good thing in terms of clarifying the operator's own understanding of time and computers. Again, this is only if the operator wants to interact directly with text files; if they are willing to use a web app for data entry all this complexity can be hidden behind a date picker and UI, and all this stuff can be implemented differently in different apps, with posix time the canonical thing everyone understands. There are always easy libraries for getting in and out of posix time ;)

Maybe this also helps with ordering. Should also be easier to implement, at the cost of being a bit harder for the operator.

But it pushes out date wrangling and viewing to helper apps, which each can do their thing in some reasonable way.

And it leaves the possibility of interacting with text files directly using something like the workflow I described above.

The zenmaster koan is something like, "the datasase should only ever store posix timestamps." But applied to text files for hledger.

If this is a convincing argument, the next step would be to close this feature request and open another one. I'm not an hledger user for some time, so maybe someone else should open the feature request.

I will wait for feedback before picking any direction or decision.

tphyahoo commented 2 weeks ago

A potential compromise would be to allow 2 formats for the hledger source of truth text files

POSIX1725297599

or

UTC20240902

which should parse to noon UTC on that day.

I was tempted to allow a UTC-4_20240902

to indicate noon eastern time, but then I remembered fractional time zones

https://www.timeanddate.com/time/time-zones-interesting.html

and thought "this isn't a good compromise anymore"

In practice, for non UTC time zones, sometimes and maybe even often operator error / confusion will result in

UTC20240902

type dates being saved for the wrong day for some given data. But this could be viewed as a reminder to the user if you're not entering timestamps all bets are off anyway.

If the data entry person wants to be really careful with this, they can always keep a web tab open for sanity checking the UTC time at time of data entry, or the utc time of some timestamp.

I think UTC20240902 is easy to write and easy to implement.

Whereas doing the full ISO8601 thing will result in more frustration both in implementation and in use.

And you are sloppy with UTC, worst case you are off by a day in one direction or the other, where you weren't keeping timestamps anyway.

The important thing is to be unambiguous in your source of truth.

alerque commented 2 weeks ago

I'm also a proponent of being able to save finer grained timestamps. However I'm pretty sure forcing data to be entered/saved with higher precision than the real life info would be worse than not being able to enter it at all. It would be an unnecessary source of data entry friction and cause much of the data to have made up values that do not reflect reality. The difference between the exact second timestamp on my imported bank wire transfers and the day resolution on my cash entries ("my wife took ₺200,00 out of my wallet sometime this morning") would create more false information than true and it would be unclear on querying the data what data was made up because the format forced it vs. what was real data.

tphyahoo commented 2 weeks ago

alerque -- was this replying to both my comments, or only the first?

In the second I recommended a UTC date only compromise that allows (and communicates) imprecision in the data. (+ or minus 1 day say)

Thus one format communicates the user doesn't care too much about precision, time and timezone; the other format communicates the user is at least trying to nail down a precise time.

alerque commented 2 weeks ago

alerque -- was this replying to both my comments, or only the first?

Both.

It's just that whenever human is putting in the formatted date, bad things happen.

Then why use #PlainTextAccounting at all? All fields are (or can be) manually entered text.

Optionally hledger can validate that all used accounts are declared, payees are in a known list, and commodities expected. You can write a parser to make sure all the codes cross reference with your bank's ids—or not. I recently discovered that I had a number of transactions entered with the commodity as instead of £ which was throwing off by GBP balances.

All anecdotal to say I think the date fields should be relatively free-form as long as it parses to a valid timestamp using whatever date parsing library gets used. Whether people choose to handle timezones correctly or not is not hledger's problem if it gives the people that do want to the tools to do so.

The "assume noon UTC if not otherwise specified" would wreak havoc on my books. The three most common time zones I have transactions in are GMT-7, UTC+3, and UTC+7.

Stealthmate commented 2 weeks ago

Wow, it's nice to see this much activity, especially for a 10-year old issue!

I do agree with @alerque about the UX part:

It would be an unnecessary source of data entry friction

As a daily user of hledger myself, I can't really imagine myself doing mental gymnastics to convert to/from POSIX timestamps every time I record some random cash expense.

That being said

cause much of the data to have made up values that do not reflect reality.

I don't think this is necessarily a bad thing. Currently we have resolution up to the day, and while that does reflect reality, it certainly destroys information about transactions where time of day actually matters. On the other hand, if we switch to timestamps (no matter if POSIX or ISO8601), we would have to internally somehow fill in the time-of-day for transactions that don't have it, and that would inevitably create "made up values". But if that's considered a bad thing, then our only options are to either force everyone to write the exact time of day (and then people will pick the "made up values" manually) or not use time of day at all (current implementation).

The "assume noon UTC if not otherwise specified" would wreak havoc on my books. The three most common time zones I have transactions in are GMT-7, UTC+3, and UTC+7.

@alerque This part I'm actually curious about. Again, for the same reasons as above, we probably need to assume some timezone by default (or let people specify it via a directive, see Simon's comment above). Would this still break your books and if yes - why? How are your books structured currently with regards to TZs? Is everything just mixed in one file?

tphyahoo commented 2 weeks ago

OK, maybe my proposal was unrealistic given the user input friction it will cause.

@alarque "I think the date fields should be relatively free-form as long as it parses to a valid timestamp"

By "valid timestamp" you mean what? I usually mean a 32 bit unix/posix timestamp.

Problem is, how do parse "20240902" to a timestamp, with or without a timezone.

https://en.wikipedia.org/wiki/ISO_8601

talks about reduced precision and duration.

So should hledger have these notions? ie should 20240902 (a day) be a 24 hour duration? Sounds complicated. This kind of complicated is what inspired the zen koan about always storing posix time, and my (I admit not functional) proposal.

These kinds of subtleties and nuances is what made me throw my hands up and push that time should be stared as posix seconds; or admit that you are waving your hands about the true time.

https://downloads.haskell.org/ghc/latest/docs/libraries/time-1.12.2-dfcf/Data-Time-Format-ISO8601.html#t:Format

has 8 instances of class ISO8601

these include UTCTime, Day, CalendarDiffDays, CalendarDiffTime, LocalTime, TimeOfDay, TimeZone, ZonedTime

iiuc, the parse function in that lib can parse to any of the above data types.

Day and UTCTime sound morally close to what I proposed earlier. The other types, idk. I see all this complexity and my instinct is to reject it and insist on something simple, discoverable and sane. Not sure what that should be. Probably not what I proposed earlier.

But something?

To focus, my primary question is what should the target "timestamp" type of whatever parsing happens be? What type is timestamp?

Okay, it's not a traditional 32 bit unix timestamp.

But then what is it?

simonmichael commented 2 weeks ago

Hi Thomas! Hope you’re well.

It’s great that we’re brainstorming - helps reduce costly mistakes. Time is indeed very complex but I think we have the language, libraries and dev style to handle it a little (famous last words 😅).

Internally, Val first proposed using POSIXTime but that would not allow representing historical data before 1970. Currently we can represent data from AD 0 (but not before that alas). Next we thought of ZonedTime as being closest to the representation that users are working with. But that has no Eq instance, I presume for reasons. So currently we’re assuming it’ll be UTCTime.

Externally, we’re focussed on the journal format here, not much on UI concerns. In the journal file, we’d certainly like to keep things as convenient as possible while still being principled underneath. That could mean eg • Transactions are not required to have times (of course) • Transactions with and without times can be mixed freely in one file • Times and zones can be written in any of the usual main ISO spellings • Zones are optional and will be inferred • Seconds and fractions of seconds are optional • Files can always be interpreted unambiguously and deterministically, regardless of their sequence, environment, platform etc.

tphyahoo commented 2 weeks ago

Thanks Michael, all's well ;)

Those are some fascinating (and reasonable) compromises.

Many users, many use cases. Quite the journey, and accomplishment.