rezemika / humanized_opening_hours

A parser for the opening_hours fields from OpenStreetMap
GNU Affero General Public License v3.0
26 stars 20 forks source link

Parsing Failure against Non-standard DoW and presence of 'AM' and / or 'PM' #29

Closed TariqAHassan closed 5 years ago

TariqAHassan commented 5 years ago

The parser will fail against opening hours that contain nonstandard days of the week abbreviations and the presence of 'AM' and / or 'PM'.

For example, "Mon-Sat 11:30AM-10PM" will fail on both of the counts above (see OSM node 30899821). Moreover, the lack of :00 seems to confuse it as well.

import humanized_opening_hours as hoh

hoh.OHParser("Mon-Sat 11:30AM-10PM")

Error:

Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/site-packages/humanized_opening_hours/main.py", line 326, in __init__
    self.field, optimize
  File "/anaconda3/lib/python3.6/site-packages/humanized_opening_hours/field_parser.py", line 296, in get_tree_and_rules
    tree = PARSER.parse(field)
  File "/anaconda3/lib/python3.6/site-packages/lark/lark.py", line 223, in parse
    return self.parser.parse(text)
  File "/anaconda3/lib/python3.6/site-packages/lark/parser_frontends.py", line 118, in parse
    return self.parser.parse(text)
  File "/anaconda3/lib/python3.6/site-packages/lark/parsers/xearley.py", line 130, in parse
    column = scan(i, column)
  File "/anaconda3/lib/python3.6/site-packages/lark/parsers/xearley.py", line 119, in scan
    raise UnexpectedCharacters(stream, i, text_line, text_column, {item.expect for item in to_scan}, set(to_scan))
lark.exceptions.UnexpectedCharacters: No terminal defined for 'n' at line 1 col 3
Mon-Sat 11:30AM-10PM
  ^
Expecting: {Terminal('MINUS'), Terminal('CLOSED'), Terminal('COMMA'), Terminal('__IGNORE_0'), Terminal('OPEN')}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-36-384ae46503c2>", line 1, in <module>
    hoh.OHParser(opening_hours)
  File "/anaconda3/lib/python3.6/site-packages/humanized_opening_hours/main.py", line 332, in __init__
    col=e.column
humanized_opening_hours.exceptions.ParseError: The field could not be parsed, it may be invalid. Error happened on column 3.

In order for the string to be parsed correctly, it must be modified to: 'Mo-Sa 11:30-22:00'.


Great job on the package overall. I know that this is a hard problem!

rezemika commented 5 years ago

Thank you for your report! However, this is a normal behavior.

Indeed, it is theoretically possible to automatically fix this field, but currently, there is nothing in the code to remove "AM" / "PM" or to fix the weekday names. It may be added, but unfortunately, I am very busy these days. If the support of these errors is crucial to you, you can try to fix the field before parsing with this library: https://github.com/rezemika/oh_sanitizer

Sorry for the inconvenience!

TariqAHassan commented 5 years ago

@rezemika

No worries. Thanks for the reply.