jellyfin / jellyfin-kodi

Jellyfin Plugin for Kodi
https://jellyfin.org
GNU General Public License v3.0
820 stars 111 forks source link

save_last_sync try to convert HTTP date header with a localized python date-time parser which is incorrect #916

Open prahal opened 2 days ago

prahal commented 2 days ago

Describe the bug When the locale has a month name format "%b" which is not English save_last sync raise an exception because it cannot parse the server-date (server-date is the HTTP header "Date" of the HTTP request, from .kodi/addons/plugin.video.jellyfin/jellyfin_kodi/jellyfin/http.py: self.config.data["server-time"] = r.headers["Date"]).

HTTP Date has three format https://datatracker.ietf.org/doc/html/rfc9110.html#section-5.6.7, the "preferred format is a fixed-length and single-zone subset of the date and time specification used by the Internet Message Format [RFC5322]". That is https://datatracker.ietf.org/doc/html/rfc5322#section-3.3.

strptime is wrong as HTTP Date is not localized, we should use a dedicated function.

Seems the closest to parse HTTP date is to parse it as an email date with email.utils.parsedate_to_datetime. https://stackoverflow.com/questions/1471987/how-do-i-parse-an-http-date-string-in-python (not that I don't have enough reputation to comment on the wrong answers that is rfc822 which is not more appropriate that email.utils as rfc822 is the obsolete RFC for email dates (the current one is https://datatracker.ietf.org/doc/html/rfc2822#section-3.3 which is what email.utils.parse is about), not an RFC for HTTP dates. And the strptime answer that as this bug re'port shows does not work on localized systems, i.e. with an LC_TIME not English).

Also see the discussion in https://github.com/jellyfin/jellyfin-kodi/issues/288#issuecomment-617950518 I disagree that ISO8601 is the way to go because server-date is about the HTTP date. We will have to parse the HTTP date format either way, so why not keep it the IMF format from RFC5322?

The issue is that we parse a non localized datetime string with a localized function, not that the format is wrong.

But do we agree to keep RFC5322 IMF date format?

Do we agree to use email.utils.parsedate_to_datetime (it is python 3.3 and above) or should I use email.utils.parse and then convert to datetime?

And is it fine to use email.utils.parsedate rfc2822 even if I believe it does not cope with the two obsolete HTTP date formats that the RFC9110 says we MUST be able to read?

  Sunday, 06-Nov-94 08:49:37 GMT   ; obsolete RFC 850 format
  Sun Nov  6 08:49:37 1994         ; ANSI C's asctime() format

Maybe test cases would be good too. Probably a test case with a locale with non-English month names, one that 2 digit year, a non rfc822 date string. Do you see others?

Edit: wed could also include djjango parse_http_date codfe which seems straightforward: https://github.com/django/django/blob/stable/5.1.x/django/utils/http.py#L97

MONTHS = "jan feb mar apr may jun jul aug sep oct nov dec".split()
__D = r"(?P<day>[0-9]{2})"
__D2 = r"(?P<day>[ 0-9][0-9])"
__M = r"(?P<mon>\w{3})"
__Y = r"(?P<year>[0-9]{4})"
__Y2 = r"(?P<year>[0-9]{2})"
__T = r"(?P<hour>[0-9]{2}):(?P<min>[0-9]{2}):(?P<sec>[0-9]{2})"
RFC1123_DATE = _lazy_re_compile(r"^\w{3}, %s %s %s %s GMT$" % (__D, __M, __Y, __T))
RFC850_DATE = _lazy_re_compile(r"^\w{6,9}, %s-%s-%s %s GMT$" % (__D, __M, __Y2, __T))
ASCTIME_DATE = _lazy_re_compile(r"^\w{3} %s %s %s %s$" % (__M, __D2, __T, __Y))

def parse_http_date(date):
    """
    Parse a date format as specified by HTTP RFC 9110 Section 5.6.7.

    The three formats allowed by the RFC are accepted, even if only the first
    one is still in widespread use.

    Return an integer expressed in seconds since the epoch, in UTC.
    """
    # email.utils.parsedate() does the job for RFC 1123 dates; unfortunately
    # RFC 9110 makes it mandatory to support RFC 850 dates too. So we roll
    # our own RFC-compliant parsing.
    for regex in RFC1123_DATE, RFC850_DATE, ASCTIME_DATE:
        m = regex.match(date)
        if m is not None:
            break
    else:
        raise ValueError("%r is not in a valid HTTP date format" % date)
    try:
        year = int(m["year"])
        if year < 100:
            current_year = datetime.now(tz=timezone.utc).year
            current_century = current_year - (current_year % 100)
            if year - (current_year % 100) > 50:
                # year that appears to be more than 50 years in the future are
                # interpreted as representing the past.
                year += current_century - 100
            else:
                year += current_century
        month = MONTHS.index(m["mon"].lower()) + 1
        day = int(m["day"])
        hour = int(m["hour"])
        min = int(m["min"])
        sec = int(m["sec"])
        result = datetime(year, month, day, hour, min, sec, tzinfo=timezone.utc)
        return int(result.timestamp())
    except Exception as exc:
        raise ValueError("%r is not a valid date" % date) from exc

To Reproduce

  1. Go to Jellyfin plugin in Kodi
  2. Go to your library
  3. Open the context menu on the library and click on Sync
  4. See error in the Kodi log .kodi/temp/kodi.log

Expected behavior

No error in Kodi log and probably a working last sync guard.

Logs

2024-09-16 12:40:45.749 T:740028    info <general>: JELLYFIN.jellyfin_kodi.library -> ERROR::jellyfin_kodi/library.py:536 time data '16 Sep 2024 10:40:33 GMT' does not match format '%d %b %Y %H:%M:%S GMT'
                                                   Traceback (most recent call last):
                                                     File "jellyfin_kodi/library.py", line 530, in save_last_sync
                                                       time_now = datetime.strptime(
                                                                  ^^^^^^^^^^^^^^^^^^
                                                     File "/usr/lib/python3.12/_strptime.py", line 554, in _strptime_datetime
                                                       tt, fraction, gmtoff_fraction = _strptime(data_string, format)
                                                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                     File "/usr/lib/python3.12/_strptime.py", line 333, in _strptime
                                                       raise ValueError("time data %r does not match format %r" %
                                                   ValueError: time data '16 Sep 2024 10:40:33 GMT' does not match format '%d %b %Y %H:%M:%S GMT'

2024-09-16 12:40:45.753 T:740028    info <general>: JELLYFIN.jellyfin_kodi.library -> INFO::jellyfin_kodi/library.py:541 --[ sync/2024-09-16T10:38:45z ]
2024-09-16 12:40:45.756 T:740028    info <general>: JELLYFIN.jellyfin_kodi.full_sync -> INFO::jellyfin_kodi/full_sync.py:227 Full sync completed in: 0:00:59
2024-09-16 12:40:45.759 T:740028    info <general>: JELLYFIN.jellyfin_kodi.full_sync -> INFO::jellyfin_kodi/full_sync.py:703 --<[ fullsync ]

Screenshots

System (please complete the following information):

Additional context

I tested this that seems to work.

6 import threading
    7 from datetime import datetime, timedelta
    8 from email.utils import parsedate_to_datetime
    9 import queue
   10         

 527     def save_last_sync(self):
  528 
  529         try:
  530             time_now = parsedate_to_datetime(
  531                 self.server.config.data["server-time"].split(", ", 1)[1],
  532                 "%d %b %Y %H:%M:%S GMT",
  533             ) - timedelta(minutes=2)
  534         except Exception as error:
  535 
  536             LOG.exception(error)
  537             time_now = datetime.utcnow() - timedelta(minutes=2)
  538 
  539         last_sync = time_now.strftime("%Y-%m-%dT%H:%M:%Sz")
  540         settings("LastIncrementalSync", value=last_sync)
  541         LOG.info("--[ sync/%s ]", last_sync)
  542 
oddstr13 commented 2 days ago

I haven't dug quite deep enough yet, but using ISO time here would be best. The requests the dates are coming from are presumably handled by the Jellyfin plugin, KodiSyncQueue.

If not already available somewhere in the API response, it should be added either as a custom header, or an additional field, in the plugin.

TODO: Investigate the relevant API responses and their structure, to see if ISO time is provided, or if the structure of the response allows for an additional field containing the server datetime. Failing that, a response header with the ISO datetime should be added. X-ISO-8601 or some such if no (defacto) standard header for this exists.

Messing with localized time string shall be avoided if at all possible, only trouble lies down that path.

If I could decide, the whole world would switch to ISO 8601, UTC, with no DST. That is how big of a pain this is.

ISO 8601 was published on 06/05/88 and most recently amended on 12/01/04. https://xkcd.com/1179/

oddstr13 commented 2 days ago

I should note that this failure to parse datetime shouldn't be an issue unless the server is more than 2min out of sync with the client, something that is highly unlikely (should be near impossible) if both the server and client are properly synced with a time server.

The consequence of the time being sufficiently out of sync (for this particular function anyways) is that content update might be missing during parial sync (events that happened in the window between server time and client time, minus the 2min tolerance that is put in the code here via a timedelta).

The error in the log is otherwise harmless, except from potentially causing slightly slower sync times and larger log files (the error is caught, then logged – should probably get changed to only log the message as a warning, not the whole stack-trace as an error).

prahal commented 1 day ago

I should note that this failure to parse datetime shouldn't be an issue unless the server is more than 2min out of sync with the client, something that is highly unlikely (should be near impossible) if both the server and client are properly synced with a time server.

OK. So I will lower its priority on my TODO.

Messing with localized time string shall be avoided if at all possible, only trouble lies down that path.

If I could decide, the whole world would switch to ISO 8601, UTC, with no DST. That is how big of a pain this is.

I don't understand. The date string from HTTP is not localized, and is already UTC/GMT (only it is not IS0 8601, but is that critical?). strptime in save_last_sync expects the current month for a user session with a French locale to be sept. (%b) while the HTTP Date header returns a non localized "Sep"' for the month of September. Thus, I get: ValueError: time data '16 Sep 2024 10:40:33 GMT' does not match format '%d %b %Y %H:%M:%S GMT' because strptime expect 16 sept. 2024 10:40:33 GMT for '%d %b %Y %H:%M:%S GMT It bugs because the date is not localized, not the opposite.

I agree that with an ISO date we would be able to use a localized date parsing function like strptime without an error. Because there would be no string thus no localization (even though I am not confident numbers are always localized as indian/arabic numbers, then ISO would not help with strptime use). But in my opinion, the issue is not the date format. It is about using strptime to parse a date that is not localized. I don't think these functions are intended to be used in this context (even for an ISO8601 date format).

But I confirm that with this bug, jellyfin-kodi still works. Might be slower.

I believe the best is to include a function to parse HTTP RFC9110 section 5.6.7 date format like django does. But I am uneasy copying and pasting it into jellyfin-kodi as django is BSD 3-Clause, not GPL v3. We could ask one to rewrite a similar function based on general guidelines, i.e. parse the date with regex placeholders, support 2 digit years. Or even ask django developers for permission to reuse this code. But now that I saw their code, I believe I cannot write this function myself (probably I should not have pasted the function here to avoid other having the same issue, do you want me to edit it out?).

In the meantime, we could the closest available function in python libraries, email.utils.parsedate_to_datetime if requiring python 3.3 or above is OK. Or email.utils.parsedate then convert to datetime. Are you fine with this option? Or only an ISO 8601 patch would be merged?

I confirm there is no "8601" string in KodiSyncQueue, so probably no API that returns an X-ISO-8601 date. But as far as I looked, KodiSyncQeue does add or modify HTTP headers. I believe the HTTP Date header comes from Microsoft ASP dotnet.

oddstr13 commented 4 hours ago

@prahal I realize that you probably wanted to work on this, so I apologize for jumping in and doing it myself :melting_face:

I've implemented a Server-Time header in KodiSyncQueue, in addition to (probably) fixing the parsing of the standard Date header.

Both PRs could do with some testing, especially the one for this repo, as I've yet to test the different code-paths of it, if you're interested in having a look at that. Feel free to comment on my code too, if you have questions around any of it :slightly_smiling_face: