railslove / cmxl

your friendly MT940 SWIFT file parser for bank statements
http://railslove.com
MIT License
46 stars 25 forks source link

Minimize diff between entry and valuta dates #29

Closed grncdr closed 5 years ago

grncdr commented 5 years ago

Because entry_date is provided as MMDD, the year is ambiguos.

This changes the heuristic for choosing the year to one that mimizes the difference (in days) between valuta date and entry date.

See also: https://github.com/railslove/cmxl/issues/28 and https://github.com/railslove/cmxl/pull/16

grncdr commented 5 years ago

I haven't added another test fixture yet, but I will if general approach is ok for the maintainers.

bumi commented 5 years ago

:+1: thanks for the PR! @hapm @Uepsilon can you look at it?

I think we should add a few more specs for this case and also document that there is some magic happening and if somebody does not like the magic then data['entry_date'] can be accessed directly.

hapm commented 5 years ago

Looks good to me. I think thats the best you can do without accessing the corresponding fields from the opening and closing statements. Even though that is my currently preferred way to solve this issue and therefor I will fall back to data['entry_date'].

hapm commented 5 years ago

Only two concerns I would raise:

  1. I don't have enougth experience with lambdas, blocks, date arithmetics and other ruby stuff performance wise, but I would guess that the old implementation performed much better. If you work with big MT940 files this could raise problems. Did you check how this performs against the old implementation?
  2. This is totally based on my own preferred styles, but the implementiation imho is harder to read than the old one.
grncdr commented 5 years ago

Comparing performance and readability against a buggy/incomplete implementation seems a little unfair 😉.

Jokes aside, I would expect performance to be worse. Probably a version of the same algorithm can be written that handles each case with integer comparisons, but to me this seemed to express the intent more clearly 🤷‍♂️

I think thats the best you can do without accessing the corresponding fields from the opening and closing statements. Even though that is my currently preferred way to solve this issue

Could you explain this approach a bit more? I don't know exactly what you mean. It could be that a fix inside this method isn't really needed at all?

hapm commented 5 years ago

Could you explain this approach a bit more? I don't know exactly what you mean. It could be that a fix inside this method isn't really needed at all?

Unfortunatly there is no direct reference to this in the specs, but the entry date should be between the (intermediate) opening balance date (Statement.opening_or_intermediary_balance.date or field 60a) and the (intermediate) closing balance date (Statement.closing_or_intermediary_balance.date or field 62a) of the current statement, the transaction belongs to.

I would expect the MT940 to be malformed when the entry date is not between the two, even though the specs don't explicitly state that.

bumi commented 5 years ago

@Uepsilon can you have a look and merge this? I think we should release this fix.

I'd only would like to have a bit documentation around this and note somewhere that if somebody does not like the magic data['entry_date'] can be accessed directly, or we even add an accessor for that (something like entry_date_raw)?

thanks again for the contribution!

Uepsilon commented 5 years ago

hey @grncdr

thanks for the contribution and sorry for the delay.

This is a rather complexe issue. in theory, valuta (date in this code) does not have any connection to the entry_date.

valuta is the date the money is actually booked while the entry_date can be before or after. so this "fix" does not fix it but changes the assumption. it is, in theory, possible to have a valuta of 2019-01-01 yet theentry_date is 2019-12-31. strange and stupid but valid!

IMHO the standard is not sufficient with the entry-date format limited to month and day only.

this code adds some complexity which I'm not sure we need.

But if we had to assume - for the sake of simplicity - that the dates are always within a month range, something like

# we assume that valuta (date) and entry_date have a close connection. so valuta and entry_date should not be 
# further apart than one month. this leads to some edge cases

# valuta is in january while entry_date is in december => entry_date was done the year before
e_date = to_date(data['entry_date'], date.year - 1) if date.month == 1 && e_date.month == 12

# valuta is in december but entry_date is in january => entry_date is actually in the year after valuta
e_date = to_date(data['entry_date'], date.year + 1) if date.month == 12 && e_date.month == 1

would be way simpler processing and maintenance wise while having the same effect. it simply would do it only when needed and not every time.

also, since this is an edge case, I'd absolutely love some specs for that. otherwise it breaks when forgotten about. I'll add some

Uepsilon commented 5 years ago

draft: #32

@hapm @grncdr would like to hear your thoughts on this approach :)

hapm commented 5 years ago

It's a workable approach, if you want to estimate the entry date only based on the transaction, but for my use case it is much more plausible to get the raw entry date and calculate the year based on the date portion in fields 60a and 62a. I know that this approach introduces an ugly backward dependency between the transaction and the statement, but imho that is the intended way on how to get the year properly, based on the mt940 specs.

Uepsilon commented 5 years ago

i don't think that the entry date has anything in common with the opening (60a) or closing balance (62a). those two values simply tell what the balance was before and after the following statement.

there still could be a valuta between those with an entry_date from years ago.

it's best to not rely / use the entry date at all. valuta tells you when the money is booked, every other date is just some additional information.

Don't sell the skin till you have caught the bear

hapm commented 5 years ago

I probably don't understand the dates in 60a and 62a correctly then. For me entry_date is the booking date and date is the date of value, or the date the amount gets available. So my interpretation is exactly the oposit of yours, but for me it makes sence. If I request all transactions for a specific date, I would expect to get the transactions booked that date, and not valued that date. So if I continuously get transaction based on dates I get the full history of transactions booked in a given timeframe, not valued in that timeframe. And I can rely on the statement history to make a completion check of what was booked in that timeframe. My reasoning for this: I can never be sure to always have everything valued in a timeframe, because there can always be future transactions that would add to that timeframe. But I can be sure to have everything that was booked in a timeframe, because time passes on and booking can only happen at the date the transaction is entered in the system. Said in another way: The entry_date can not jump around imho and should be continuous, while the value date can jump around as needed to move the transaction to when it should be valued.

Uepsilon commented 5 years ago

since the standard specifies the first 6 digits to be valuta, this is the thing we should use and trust. valuta is the date the money hits the account. not the other way around. valuta will never be in the future

the 60a / 62a give you a timeframe ending at latest right now, so there is no room for future, yet unknown transactions to be included.

Uepsilon commented 5 years ago

closed in favor of #32