karlicoss / orgparse

Python module for reading Emacs org-mode files
https://orgparse.readthedocs.org
BSD 2-Clause "Simplified" License
371 stars 43 forks source link

Non-existant date errors without context #62

Open hrehfeld opened 1 year ago

hrehfeld commented 1 year ago

2011-04-31 is not a valid date, april only has 30 days.

testcase:

** test
<2011-04-31 Sat>

leads to:

Traceback (most recent call last):
  File "/home/hrehfeld/projects/2023/topics/orgmode.py", line 31, in <module>
    doc = orgparse.load(filepath, make_env(filepath))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/__init__.py", line 140, in load
    return load(orgfile, env)
           ^^^^^^^^^^^^^^^^^^
  File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/__init__.py", line 148, in load
    return loadi(all_lines, filename=filename, env=env)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/__init__.py", line 168, in loadi
    return parse_lines(lines, filename=filename, env=env)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/node.py", line 1464, in parse_lines
    node._parse_pre()
  File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/node.py", line 1151, in _parse_pre
    self._body_lines = list(ilines)
                       ^^^^^^^^^^^^
  File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/node.py", line 1202, in _iparse_timestamps
    self._timestamps.extend(OrgDate.list_from_str(l))
                            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/date.py", line 471, in list_from_str
    odate = cls(
            ^^^^
  File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/date.py", line 227, in __init__
    self._start = self._to_date(start)
                  ^^^^^^^^^^^^^^^^^^^^
  File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/date.py", line 238, in _to_date
    return datetime.date(*date)
           ^^^^^^^^^^^^^^^^^^^^
ValueError: day is out of range for month

I'd expect orgparse either to parse the date somehow, or provide context where the error happens. This is probably as easy as augmenting the ValueError with location info.

maikol-solis commented 5 months ago

I'm using hyperorg and I having a similar issue. The problem is that I don't where to look at to fix it.

$ hyperorg -v roam roam_html

INFO : Input from: roam
Traceback (most recent call last):
  File "/Users/maikol/.local/bin/hyperorg", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/maikol/Library/Application Support/pipx/venvs/hyperorg/lib/python3.12/site-packages/hyperorg/__main__.py", line 136, in main
    node_count = reader.read_org_files()
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/maikol/Library/Application Support/pipx/venvs/hyperorg/lib/python3.12/site-packages/hyperorg/reader.py", line 105, in read_org_files
    orgparse_nodes_list = self._orgparse_obj_from_input_dir()
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/maikol/Library/Application Support/pipx/venvs/hyperorg/lib/python3.12/site-packages/hyperorg/reader.py", line 63, in _orgparse_obj_from_input_dir
    result.append(orgparse.load(fp))
                  ^^^^^^^^^^^^^^^^^
  File "/Users/maikol/Library/Application Support/pipx/venvs/hyperorg/lib/python3.12/site-packages/orgparse/__init__.py", line 138, in load
    return load(orgfile, env)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/maikol/Library/Application Support/pipx/venvs/hyperorg/lib/python3.12/site-packages/orgparse/__init__.py", line 146, in load
    return loadi(all_lines, filename=filename, env=env)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/maikol/Library/Application Support/pipx/venvs/hyperorg/lib/python3.12/site-packages/orgparse/__init__.py", line 166, in loadi
    return parse_lines(lines, filename=filename, env=env)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/maikol/Library/Application Support/pipx/venvs/hyperorg/lib/python3.12/site-packages/orgparse/node.py", line 1457, in parse_lines
    node._parse_pre()
  File "/Users/maikol/Library/Application Support/pipx/venvs/hyperorg/lib/python3.12/site-packages/orgparse/node.py", line 1144, in _parse_pre
    self._body_lines = list(ilines)
                       ^^^^^^^^^^^^
  File "/Users/maikol/Library/Application Support/pipx/venvs/hyperorg/lib/python3.12/site-packages/orgparse/node.py", line 1193, in _iparse_timestamps
    self._timestamps.extend(OrgDate.list_from_str(self._heading))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/maikol/Library/Application Support/pipx/venvs/hyperorg/lib/python3.12/site-packages/orgparse/date.py", line 471, in list_from_str
    odate = cls(
            ^^^^
  File "/Users/maikol/Library/Application Support/pipx/venvs/hyperorg/lib/python3.12/site-packages/orgparse/date.py", line 227, in __init__
    self._start = self._to_date(start)
                  ^^^^^^^^^^^^^^^^^^^^
  File "/Users/maikol/Library/Application Support/pipx/venvs/hyperorg/lib/python3.12/site-packages/orgparse/date.py", line 238, in _to_date
    return datetime.date(*date)
           ^^^^^^^^^^^^^^^^^^^^
ValueError: month must be in 1..12
buhtz commented 5 months ago

Dear Maikol, I am not the maintainer of orgparse but of Hyperorg. For the upcoming release Hyperorg will catch orgparse-related exceptions and won't stop the whole parsing process (Fixed #120 not yet released). But the one node won't get parsed and is lost.

I'd expect orgparse either to parse the date somehow, or provide context where the error happens. This is probably as easy as augmenting the ValueError with location info.

From my perspective as maintainer of Hyperorg I see no problem with orgparse raise an exception. orgparse is a parser and not more. I prefer orgparse being more strict about this. Also the error is quite clear described.

I see no need for a fix on the site of orgparse. Applications on higher level using packages like orgparse are responsible for handling such exceptions.

karlicoss commented 5 months ago

Yeah not sure what "somehow" would mean if the date is wrong, but agree the exception could be more descriptive and include the location information. Another option is to have multiple "error policies", by default it would be strict and throw exceptions, but with a more defensive policy the user could specify, it could ignore the node or offending org block completely

buhtz commented 5 months ago

i would recommend to ask at emacs-orgmode mailing list for an advice.

I tested org-html-export-as-html:

<2011-04-35 Sat>

becomes

<span class="timestamp-wrapper"><span class="timestamp">&lt;2011-05-05 Do&gt;</span></span>
yantar92 commented 5 months ago
<2011-04-35 Sat>

From Org mode syntax perspective, there is no requirement that timestsamp date represents a valid date. Another question is how to interpret date/time that does not look like normal date.

As an implementation detail, Org mode uses Emacs time API, which uses POSIX mktime. As you can see in https://www.gnu.org/software/libc/manual/html_node/Broken_002ddown-Time.html, mktime allows the year/month/day to be outside ranges - they will be normalized.

The same goes for time component. Something like <2011-04-01 26:00> is a time at 2am on 2011-04-02. Note that this last representation is actually used in the wild.