karlicoss / orgparse

Python module for reading Emacs org-mode files
https://orgparse.readthedocs.org
BSD 2-Clause "Simplified" License
371 stars 43 forks source link

Date parsing with timezone information #22

Closed vaab closed 2 years ago

vaab commented 3 years ago

My CLOCK are generated this way:

  CLOCK: [2020-09-09 Wed 09:04 CEST]--[2020-09-09 Wed 09:04 CEST] =>  0:00

Context

Notice the timezone information. I need to be able to parse this timezone information to have non-naive datetimes. Indeed, I'm travelling as I work, and use these clock information generated in local times all over the world.

They need to be converted then to my customers timezone.

About non-naive datetime

Anyway, it seems quite desirable to handle only non-naive datetime anyway. Note also that I don't think I've done anything special to change the time format of emacs.

Analysis of code

First, orgparse don't recognize this because of the timezone information that is not expected. I'm able to fix that, but then, what to do with the timezone information ? python's datetime is a really big mess and does not know how to solve this alone:

Otherwise you must create yourself a tzinfo object...

So, the only way to do this correctly is:

How do you view this problem and are you interested to support full non-naive time parsing ? I'm able to send you a PR, but won't do it without your approval, as it may imply some decisions about adding a dependency to orgparse.

karlicoss commented 3 years ago

Hi!

Oh, interesting, I think it's the first time I've seen the timezone in org-mode dates!

Note also that I don't think I've done anything special to change the time format of emacs.

Are you sure? I did a quick web search and also grepped org-mode sources, and it doesn't look like there is native timezone support.

How do you view this problem and are you interested to support full non-naive time parsing ?

That said org-mode is extensible and I think it's would be an interesting problem to solve within orgparse too, so would be happy to help!

python's datetime is a really big mess and does not know how to solve this alone:

Yep! I've had to parse tz info with strptime several times, and every time I ended up with some hacks like splitting timezone info substring and manually parsing it separately.. or something similarly horrible.

I'm able to send you a PR, but won't do it without your approval, as it may imply some decisions about adding a dependency to orgparse.

Appreciate checking in advance :)


In terms of supporting this, I think you'll need to modify these bits:


Regarding extra dependency -- in theory, I wouldn't mind if it's not a required dependency, e.g. say you want to use library named tzparser:

try:
    import tzparser # (or whatever you want to use)
    tz = tzparser.parse(tzstr)
except ImportError:
    warnings.warn("Please install tzparser to parse timezone information")
    tz = None

Now regarding including this in orgparse -- I think considering it's not standard org-mode (but again, correct me if I'm wrong!), not sure how I feel about it. However, I think it would be possible to solve it for you without forking the library by monkey patching your code:

import orgparse.date
orgparse.date.TIMESTAMP_RE = .... 
orgparse.date.OrgDate._daterange_from_groupdict = ...
orgparse.date.OrgDate._to_date = ...

# rest of your code that uses orgparse

That way the rest of library will be using the modified version of code, and hopefully handle timezone info. Do you think that could work for you?

karlicoss commented 2 years ago

Are you sure? I did a quick web search and also grepped org-mode sources, and it doesn't look like there is native timezone support.

Closing this since still not sure if it's actually standard org-mode, but feel free to reopen if it's a common usecase for some sort of org-mode plugin which you think we could support