sisyphsu / dateparser

dateparser is a smart and high-performance date parser library, it supports hundreds of different formats, nearly all format that we may used. And this is also a showcase for "retree" algorithm.
MIT License
95 stars 23 forks source link

Support Denmark formatted times HH.mm #2

Closed codegoddesskh closed 4 years ago

codegoddesskh commented 4 years ago

Support for Denmark formatted times like "20.52" or "10.21"

codegoddesskh commented 4 years ago

Also, I really appreciate dateparser. Thank you for building it!

codecov-io commented 4 years ago

Codecov Report

Merging #2 into master will not change coverage. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master       #2   +/-   ##
=========================================
  Coverage     98.87%   98.87%           
  Complexity      165      165           
=========================================
  Files             4        4           
  Lines           357      357           
  Branches         46       46           
=========================================
  Hits            353      353           
  Misses            1        1           
  Partials          3        3
Impacted Files Coverage Δ Complexity Δ
.../github/sisyphsu/dateparser/DateParserBuilder.java 98.63% <100%> (ø) 17 <0> (ø) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4a94a88...ccb26be. Read the comment docs.

sisyphsu commented 4 years ago

Thanks for your feedback~

I checked Wikipedia, but didn't found HH.mm format in those docs:

  1. https://en.wikipedia.org/wiki/Date_and_time_notation_in_Denmark
  2. https://en.wikipedia.org/wiki/Date_format_by_country

Could you provide more official information?

This pattern (?<hour>\\d{1,2})[:.](?<minute>\\d{1,2})(?::(?<second>\\d{1,2}))?(?:[.,](?<ns>\\d{1,9}))?(?<zero>z)? matches 20.52, 20.52:30, 20.52:30.123456. this is a little weird.

And . is also used in yyyy.MM and yy.MM.dd, maybe someone will use it to support yy.MM. So I think this commit could disturb other patterns, and it's not appropriate to include it in default rules.

BTW, you could customize dateparser like this to make it support HH.mm:

DateParserUtils.registerStandardRule("\\W*(?<hour>\\d{1,2}).(?<minute>\\d{1,2})");
LocalDateTime dateTime = DateParserUtils.parseDateTime("15. jan. 2020 10.21");
assert dateTime.getHour() == 10;
assert dateTime.getMinute() == 21;

Hope it could help you

codegoddesskh commented 4 years ago

I ran into this issue with someone using one of my programs in Denmark, so their own examples pointed me out to this. Documentation is definitely scarce online! https://www.fyidenmark.com/danish_time.html

Also, here is a preview of macOS default time format if I switch my region to Denmark in settings. denmark macOS

matches 20.52, 20.52:30, 20.52:30.123456. this is a little weird.

Agreed! I should have added a the third dot as well! Here is the formatting for more detailed time specifications for default Denmark:

denmark long time formats

(My real time zone is GMT-7, which is why it is filled in as that in the example. Even though Denmark is GMT+1 ;))

sisyphsu commented 4 years ago

Understand. I think you should use registerStandardRule to support Denmark formatted time, because it's dangerous to add it into build-in rules. for example, 20.10.01 is widely regard as 2020-10-01, but not 20:10:01. if include HH.mm.ss rules into dateparser, it will cause more issues.

codegoddesskh commented 4 years ago

Totally understood, will include it on my side instead. Thanks for the recommendation about the best method to add it!