Closed slidenerd closed 9 months ago
I'm not the developer, but they do document it here: https://feedparser.readthedocs.io/en/latest/date-parsing.html#advanced-date
Different feed types and versions use wildly different date formats. Universal Feed Parser will attempt to auto-detect the date format used in any date element, and parse it into a standard Python 9-tuple in UTC
So I believe to create a timezone aware datetime object, you would do something like:
from time import mktime
from datetime import datetime, timezone
datetime.fromtimestamp(mktime(pubdate_parsed), timezone.utc)
from time import mktime from datetime import datetime, timezone datetime.fromtimestamp(mktime(pubdate_parsed), timezone.utc)
I'm wondering if it's more correct to call calendar.timegm()
rather than mktime
.
The documentation has a chart that explains that mktime()
is assuming your struct_time
is in local time, whereas timegm()
assumes UTC. Is it possible that mktime
is working by accident if localtime is set to UTC for you?
I find this stuff genuinely confusing.
Indeed I'm getting different results with this when I change the TZ env var:
#!/usr/bin/python3
import time
from datetime import datetime, timezone
from calendar import timegm
from time import mktime, tzname
print(f"tzname: {tzname}, timezone: {time.timezone}")
pubdate_parsed = time.struct_time((2022, 8, 4, 5, 53, 42, 3, 216, 0))
pub_mktime = datetime.fromtimestamp(mktime(pubdate_parsed), timezone.utc)
pub_timegm = datetime.fromtimestamp(timegm(pubdate_parsed), timezone.utc)
print(f"pubdate_parsed: {pubdate_parsed}")
print(f" mktime: {pub_mktime}")
print(f" timegm: {pub_timegm}")
root@69655a7e9e27:/usr/src/app# TZ='UTC' python try.py
tzname: ('UTC', 'UTC'), timezone: 0
pubdate_parsed: time.struct_time(tm_year=2022, tm_mon=8, tm_mday=4, tm_hour=5, tm_min=53, tm_sec=42, tm_wday=3, tm_yday=216, tm_isdst=0)
mktime: 2022-08-04 05:53:42+00:00
timegm: 2022-08-04 05:53:42+00:00
root@69655a7e9e27:/usr/src/app# TZ='EST' python try.py
tzname: ('EST', 'EST'), timezone: 18000
pubdate_parsed: time.struct_time(tm_year=2022, tm_mon=8, tm_mday=4, tm_hour=5, tm_min=53, tm_sec=42, tm_wday=3, tm_yday=216, tm_isdst=0)
mktime: 2022-08-04 10:53:42+00:00
timegm: 2022-08-04 05:53:42+00:00
I think this question belongs here and not on stackoverflow because as the library author you would be able to answer this best
Issues I referenced before asking https://github.com/kurtmckee/feedparser/issues/212 https://github.com/kurtmckee/feedparser/issues/51
Problem
How to reproduce this problem
Both insert statements above will fail
What have I found so far?
I found 3 methods but they seem to have a limitation each
Method 1
Convert it with strptime
I could do this
I am guessing this would raise an error if some feed returns an incorrect format and also I am not sure if this works when an extra leapsecond gets added
Method 2
This seems to completely lose out the timezone information or am I wrong about it? What happens here if there is a DST
Method 3 Requires a third party library called dateutil and shown below https://stackoverflow.com/a/18726020/5371505
Question
Thank you for your time