portfolio-performance / portfolio

Track and evaluate the performance of your investment portfolio across stocks, cryptocurrencies, and other assets.
http://www.portfolio-performance.info
Eclipse Public License 1.0
2.88k stars 597 forks source link

Improvement of the regular expression of the date in the PDF importers #4033

Closed Nirus2000 closed 3 months ago

Nirus2000 commented 4 months ago

Improvement of the regular expression in the date by replacing the regex pattern in the format MMMM (LLLL) and MMM (LLL). This is defined in the ExtractorUtils as a static variable.

Remove .replace("Mar", "Mar") (JDK7 vs. JDK8)

https://github.com/portfolio-performance/portfolio/pull/4029#discussion_r1612042728 and ff.

@buchen -> https://github.com/portfolio-performance/portfolio/issues/2683

buchen commented 4 months ago

I am skeptical that this change will help us in the long run.

What problem exactly is this pull request trying to address?

My hunch is:

Where am I wrong?

Where I see the point is to move the "Mrz" and "Mär" handling to the ExtractorUtil.asDate method. Although I understand at the moment we know only that the Sutor bank that is abbreviating "März" as "Mrz". This change I will cherry-pick regardless of the other discussion.

buchen commented 4 months ago

And: please, please, before make the change that takes a lot of work (I know looking up all month name in all languages takes a long of work and diligence), we can also make a draft change and then discuss :-)

Nirus2000 commented 4 months ago

Hello @buchen I understand... the first goal has already been successful. The issue with the JDK 7 to 8... okay, your variant is smarter ;-)

The problem is that with this pattern, we get a fail faster than if we work .* or [\w]{3,4} or [\wä]{3,4} and then java checks if this is a date. I therefore do not believe that this is slower. We pattern first and then we check if it's a date, right?

What is the best pattern for the month.. an universal pattern. 👍🏻

This only applies to the names of the months, there are no other date differences in my opinion. Like... one or two digit numbers... the month name ...

Alex 🔢