railslove / cmxl

your friendly MT940 SWIFT file parser for bank statements
http://railslove.com
MIT License
46 stars 25 forks source link

When the entry date is a month after a statement in january, its year is wrongly reduced by one #28

Closed hapm closed 5 years ago

hapm commented 5 years ago

As of commit d692be7dcfdc15e79328bdc215d3679e8237a488 there is a check in place for statements spanning over a year boundary. If a statement is parsed, that has an entry date in february and a date in january, the entry_date is wrongly set back to an earlier year. The following field 61 gets parsed with an entry_date of 2018-02-01 but should be 2019-02-01.

:61:1901310201DR1,6NMSCNONREF//XXXXXXXXXX

This happens, because there is the assumption that the entry_date is always before the date and not the other way around, but many banks date their monthly statements like that.

Instead of checking if entry_date.month is bigger than date.month, as a simple fix I suggest to check if entry_date.month is bigger than date.month+6. The same should be checked the other way around. if entry_date.month is smaller than date.month-6 than add a year to entry_date, to catch constalations like the one above when having a year in between:

:61:1812310101DR3498,06NTRFNONREF//XXXXXXXXXX

A more sophisticated fix would be to make sure that the date lies between the opening and closing dates (60F and 62F) of the statetement, and if not, move it there. This would require access to the statement and can therefor not be done while parsing the 61 field without coupling the Statement class.

Perhaps its easier to not guess the year at all, and allow the client to add his own year determination function? The current behaviour could stay in place as a default for backwards compatibility reasons.

bumi commented 5 years ago

puh, autsch... thanks for reporting this! that's a bad. need to wrap my head around it. (also ping @Uepsilon )

bumi commented 5 years ago

and thanks a lot for reporting and the great issue and analysis.

Uepsilon commented 5 years ago

hey @hapm, do you found this when dealing with real world data? if so, could you provide an example please?

clarifying start: the provided date is formated as follows:

during research i found some banks providing examples having the date (valuta) before the entry_date whilest others have it the other way around. this seems inconsistent. since I'd assume the valuta is the date the transaction should be considered completed while the entry date is the date the transaction was placed. so the only logical way for me is, that the valuta can be equal OR later than the entry date, not the other way around.

201902010101 is possible while 201902010202 is not.

also, if we think that it's indeed possible to plan transactions for the upcoming year, we might not even know what the entry date's year was at all when provided with 4 digits only. so we can only guess that it was the same year OR the year before if the valuta is somewhere near january. but that is a guess too.

your first example is not touched by any of the code as it happens in february. for the second line, i'd suggest not to check if the entry_date's month is bigger but to be precisely december. but then again, date is completely optional and not provided by a lot of banks at all.

hapm commented 5 years ago

Thanks for pointing out the bad example, I messed that up by copying the wrong line. I edited the example above, so it matches the real data I got from the bank with confidential information removed. My guess of why the (german) bank is booking it like this, is to make sure that the monthly interest is booked after the month is closed (therefor entry_date = first day of next month), but the interest it self belongs to the month (therefor date = last day of month the interest ist calculated for).

I tried to find a limitation in the specs, that permits or disallows that kind of statement to no avail. The only thing I could find was this:

The SWIFT System validates subfield 2, Entry Date (Date in reduced ISO form), using current System Year (Error code(s): T50).

--- SWIFT - Standard MT (Novemeber 2018) - Category 9 - Cash Management and Customer Status - MT 940 Field Specifications - 6. Field 61: Statement Line - NETWORK VALIDATED RULES

To be honest I didn't search very much for what is meant with System Year in this context, but the only reasonable conclusion I could come up with, was that the entry_date should always be between the dates of field 60a and 62a. Sadly I couldn't find a rule like that in the specs either.

To workaround this issue, I will directly access Statement.data['entry_date'] for now, and get the year based on this rule.

grncdr commented 5 years ago

We experienced the same issue at the start of February, in our case there was a line like:

:61:1901300201D1,00NCOMNONREF//XXXXXXXXX

during research i found some banks providing examples having the date (valuta) before the entry_date whilest others have it the other way around. this seems inconsistent. since I'd assume the valuta is the date the transaction should be considered completed while the entry date is the date the transaction was placed. so the only logical way for me is, that the valuta can be equal OR later than the entry date, not the other way around.

This seems totally logical to me, but apparently not to the bank 🙃

The fix suggested by @hapm makes sense to me, I'd phrase it as: the algorithm for choosing the entry dates year should minimize the difference between the two dates.

Uepsilon commented 5 years ago

61:1901300201D1,00NCOMNONREF//XXXXXXXXX

this line would not be affected by any change after all, as it would be valuta: 2019-01-30 entry_date: 2019-02-01 in any case, right?

hapm commented 5 years ago

Without any change the entry_date would be parsed as '2018-02-01', because

date.month == 1 && date.month < to_date(self.data['entry_date'], self.date.year).month

evaluates to true. With the given pull request this would be fixed. As there is no limit of the date field in the specs, I assume the given transaction could still be in a statement of 2020, and should therefor be booked in 2020 but valued in 2019. This doesn't have a real use case but I can't find anything against that in the specs.

Uepsilon commented 5 years ago

Without any change the entry_date would be parsed as '2018-02-01',

oops, true..

yeah, since the specs are unprecise about this, i'd rather ignore entry_date at all. if you need them for historical reasons (to check if you have processed that transaction already) it should make no difference as long as the resuling date after parsing is always the same.

but it's not really suitable for deciding when the booking was done

Uepsilon commented 5 years ago

fixed by #32