rezemika / humanized_opening_hours

A parser for the opening_hours fields from OpenStreetMap
GNU Affero General Public License v3.0
26 stars 20 forks source link

incomplete parsing when weekday is defined twice #28

Open nlehuby opened 5 years ago

nlehuby commented 5 years ago

The opening_hours of this place are: Mo-Fr 11:30-15:00, We-Mo 18:00-23:00

The plain text description goes like this:

>>> oh = hoh.OHParser(field, locale="en")
>>> print(oh.plaintext_week_description(year=2018, weeknumber=1, first_weekday=0))
Monday: 6:00 PM – 11:00 PM
Tuesday: 11:30 AM – 3:00 PM
Wednesday: 6:00 PM – 11:00 PM
Thursday: 6:00 PM – 11:00 PM
Friday: 6:00 PM – 11:00 PM
Saturday: 6:00 PM – 11:00 PM
Sunday: 6:00 PM – 11:00 PM

whereas the expected is:

Monday: 11:30 AM – 3:00 PM and 6:00 PM – 11:00 PM
Tuesday: 11:30 AM – 3:00 PM
Wednesday: 11:30 AM – 3:00 PM and 6:00 PM – 11:00 PM
Thursday: 11:30 AM – 3:00 PM and 6:00 PM – 11:00 PM
Friday: 11:30 AM – 3:00 PM and 6:00 PM – 11:00 PM
Saturday: 6:00 PM – 11:00 PM
Sunday: 6:00 PM – 11:00 PM
francois2metz commented 5 years ago

I created a failing test case here: https://github.com/francois2metz/humanized_opening_hours/commit/9f726b139c025405c068423599c4bbbfda25f9d0

rezemika commented 5 years ago

Thank you for your report!

It seems quite complicated to fix... I'll take a look in the code ASAP, but I'm not sure it's currently possible to fix this without rewriting a consequent part of the parsing logic (something that I'd like to do, but I currently don't have time).

francois2metz commented 5 years ago

I looked at the code and managed to fix the problem, while breaking all other features ;)

I don't know if it's feasible to merge some rules when they have the same priority in the function get_current_rule. What do you think?

rezemika commented 5 years ago

Of course, it would be great, but it must be quite difficult to check compatibility of rules before merging. Theoretically, only rules separated by commas should be merged, and rules separated by semicolons should be mutually exclusive. It may be possible to make a patch, but I think a clear handling of that would require an important rewrite of the parsing logic. The grammar specifications are so complicated...

Currently, the main reason the commas are supported is that they're massively wrongly used. They should be used only to merge opening hours of many days spanning over midnight, and many field use them as semicolons. Actually, the parser considers commas as synonyms of semicolons for this reason, because opening hours spanning over midnight are already supported with semicolons (because the parser does not care of the separator).

Also, I think this problem is related with another: when a specific period in a day is defined as closed, the whole day is considered as closed. For example, with Mo-Sa 08:00-19:00; Fr 12:00-14:00 off, the whole friday is considered as closed (to prevent problems, I raise a ParseError in this case).