Closed redneb closed 2 months ago
Thanks for the report. Is this the same as #2177, which came in just ahead of yours ?
Thanks for the report. Is this the same as #2177, which came in just ahead of yours ?
Quite likely.
Thanks for the report. Is this the same as #2177, which came in just ahead of yours ?
It is the exact same issue as #2177
I have simplified my example even further so it just includes the date.
I'll close this in favour of #2177.
Fixed in master.
Hey @simonmichael! I just compiled hledger
from master and I am getting the same problematic behavior I described in the example above:
$ hledger balance 'expr:(date:2024-01 AND acct:expense:food) OR (date:2023-12 AND acct:expense:drinks)'
10.00 expense:food
--------------------
10.00
This is with:
$ hledger --version
hledger 1.32.99-gcb0b054df-20240301, linux-x86_64
which is a version that includes 3ca208a. So it is likely that this is a different bug than #2177. Can you reopen this?
That's weird, I took care to test with your example above. Apologies..
You're right, there's more to this one that I missed. Thanks for the heads-up.
Seeking your insights once again @chrislemaire, if you remember this stuff. In postingsReport > matchedPostingsBeforeAndDuring > journalValueAndFilterPostingsWith, we can see the boolean query getting muddled as the postings report is identifying postings before and during the report period:
$ stack exec -- hledger -f 2178.j reg expr:'(date:2023 AND drinks) OR (date:2024 AND food)' --debug 4
q:
Or
[ And
[ Date DateSpan 2023
, Acct
( RegexpCI "drinks" )
]
, And
[ Date DateSpan 2024
, Acct
( RegexpCI "food" )
]
]
dateq:
Or
[ Date DateSpan 2023
, Date DateSpan 2024
]
requestedspan: DateSpan 2023-01-01..2024-12-31
journalspan: DateSpan 2023-12-22
pricespan: DateSpan ..
requestedspan': DateSpan 2023-01-01..2024-12-31
intervalspans: [ DateSpan 2023-01-01..2024-12-31 ]
reportspan: DateSpan 2023-01-01..2024-12-31
beforeandduringq:
And
[ Or
[ Acct
( RegexpCI "drinks" )
, Acct
( RegexpCI "food" )
]
, Date DateSpan ..2024-12-31
]
amtsymq: Any
reportq:
And
[ Date DateSpan ..2024-12-31
, Or
[ Acct
( RegexpCI "drinks" )
, Acct
( RegexpCI "food" )
]
]
beforestartq: Date DateSpan ..2022-12-31
postingsReport items:
...
Since expr: queries were added, it's possible for a query (with OR) to specify multiple different date periods.
This is problematic for report semantics in several ways. For example,
expr:'(date:2023 AND drinks) OR (date:2024 AND food)'
produces two disjoint result sets, and
expr:'date:feb or date:may or date:nov'
produces three disjoint report periods with holes between them.
Can all of our reports handle holes properly, calculate historical starting balances properly, etc ?
Even though the similar issue #2177 seemed to be resolved by a fix, I think that more generally across all of our reports we really can't handle this without further thought. At least, I think holes will not always be handled correctly, and I don't know how we can calculate historical starting balances correctly when multiple starting dates are possible.
This is a release blocker, and I think in the short term, we must simply disallow OR-ing of date periods.
Any thoughts on this pro or con, and on what the semantics should be if/when we support this ?
I agree that there are some issues with the semantics of date:
in disjunctive queries. However, it seems these ambiguities do not uniformly affect all commands. Commands such as aregister
, activity
, balance
, print
, and register
, for instance, can work with such queries without ambiguity. They straightforwardly report transactions or tally up balances from a specified subset of transactions, a behavior likely to align with user expectations without causing surprises.
Conversely, for commands like balancesheet
and incomestatement
, this would indeed be problematic. For these commands, perhaps making the use of date:
within expr:
queries, regardless of being disjunctive or not, impermissible would be better. Users would then need to rely on flags such as -p
, -b
, -e
, etc., to define a clearly unambiguous report period.
I believe disjunctive queries involving dates, particularly with commands like balance
, have legitimate and valuable use cases. It would be a shame if this functionality were not supported merely due to complications arising in other contexts.
Here's an idea that could potentially work for all report types: For any given query, we first determine the "date envelope" period, which is the convex hull of all dates and periods mentioned across all query terms. Then, we generate a report for this period, but with a twist: we disregard transactions that do not match the query, including any date terms.
However, I've not thoroughly analyzed this idea, so there may be flaws I haven't considered.
Edit: On second thought, I am not sure that I like this idea.
I agree in principle that many reports could do something useful with disjoint date periods. But I think pretty much all of them have modes that would give non-intuitive/broken results, as we've seen above and in #2177. Each report would need testing, design and enhancement work. Also keep in mind that certain kinds of queries working for only some reports/report modes would be a confusing UX, we prefer consistency where possible. Finally this might be a cool power feature, but it's not a common need. So I don't have time for it myself, but if anyone would like to work on it, I can help test. A good first step would be a survey of current reports and their modes and quick analysis of the impact of OR'd date periods.
Related: https://lemmy.world/post/14088295 shows one workaround for producing an income statement with disjoint subperiods (the first N days of each month; solution: extract just the transactions in those periods to a temp journal).
5be3ee9 disallowed date: in OR expressions. Moving forward with this fix for now.
I am trying to use the
balance
command with anexpr:
query that includes adate:
term, but I am getting incorrect results.Here's a complete but very minimal example. I have the following journal file (with just one transaction):
and I am running the following command:
As you can see,
balance
takes into account the transaction that occurred on2023-12-22
(which is the only transaction in this minimal example), even though it does not match with the date term which isdate:2024-01
(i.e. January of 2024). Interestingly enough, if I remove the second operand ofOR
, which should have been inconsequential as it is a sub-expression that doesn't match with anything, the problem goes away:This is counterintuitive: mathematically speaking,
<subexpr1>
is superset of<subexpr1> OR <subexpr2>
, so removing terms from the disjunction should not make the balance smaller, it should only increase it.Note that this problem does not affect
print
, asprint
seems to pickup the correct transactions. But it does affectregister
and possibly other commands.Finally: