python / cpython

The Python programming language
https://www.python.org
Other
62.39k stars 29.96k forks source link

cookielib/cookiejar cookies' Expires date parse fails with long month names #79132

Open 7489fbbc-49c9-4ee0-8d47-aa791683f515 opened 5 years ago

7489fbbc-49c9-4ee0-8d47-aa791683f515 commented 5 years ago
BPO 34951
Nosy @vadmium, @tirkarthi, @lpopil
PRs
  • python/cpython#19393
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'library'] title = "cookielib/cookiejar cookies' Expires date parse fails with long month names" updated_at = user = 'https://bugs.python.org/albmoral' ``` bugs.python.org fields: ```python activity = actor = 'lpopil' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'alb_moral' dependencies = [] files = [] hgrepos = [] issue_num = 34951 keywords = ['patch'] message_count = 7.0 messages = ['327461', '327475', '327482', '327484', '327486', '327491', '365882'] nosy_count = 4.0 nosy_names = ['martin.panter', 'xtreak', 'alb_moral', 'lpopil'] pr_nums = ['19393'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue34951' versions = ['Python 2.7', 'Python 3.6'] ```

    7489fbbc-49c9-4ee0-8d47-aa791683f515 commented 5 years ago

    http.cookiejar (cookielib, for python2.*) does not parse some cookies' Expires date.

    For example: "Friday, 1-August-1997 00:00:00 GMT" does not work (while: "Fri, 01 Aug 1997 00:00:00 GMT" works fine)

    This is basically due to long names of months (it is compared with MONTHS_LOWER: list of 3-letter months). So, I propose a small change in the definition of LOOSE_HTTP_DATE_RE (see fifth line):

    LOOSE_HTTP_DATE_RE = re.compile(
        r"""^
        (\d\d?)            # day
           (?:\s+|[-\/])
        (\w{3})\w*         # month (3 first letters only)
        ...

    Instead of: LOOSE_HTTP_DATE_RE = re.compile( r"""^ (\d\d?) # day (?:\s+|[-\/]) (\w+) # month ...

    I've tested only http.cookiejar (python 3.6), but I suposse the same change will work on cookielib

    Thanks in advance

    tirkarthi commented 5 years ago

    Thanks for the report. As far as I can see from the RFC month seems to follow three letter code. Is there a part of RFC where Python is not compliant? I can't find any related issues or RFC links allowing month format specified in the report. Can you please add the relevant part of RFC or links if any?

    Date RFC 6265 5.1.1 : https://tools.ietf.org/html/rfc6265.html#section-5.1.1

    7489fbbc-49c9-4ee0-8d47-aa791683f515 commented 5 years ago

    Thanks for your answer. I have not found any RFCs with full month names either. I'm afraid I'm not an expert here.

    But the case is that I get them in my work. Here is an example of response header:

    HTTP/1.1 200 OK Server: Oracle-iPlanet-Web-Server/7.0 Date: Tue, 10 Oct 2018 14:29:44 GMT Version-auth-credencial: v.3.0.1 Iplanet - Sun Solaris - Contexto Multiple Set-cookie: JSESSIONIDE=Del; expires=Friday, 1-August-1997 00:00:00 GMT; domain=...

    I do not know if it's an old date format (?)... or if it is a quite rare case...

    I have created some previous bash scripts using wget and they work fine, but I have had problems with python3 (and requests module) till I realized this issue. And it was not very easy: I am very new with python :(

    That's the reason of my proposal. It's just to be coherent: if we compare 3 letters of a month with MONTHS_LOWER, let's use just 3 (first) letters.

    Perhaps modifying LOOSE_HTTP_DATE_RE is not a good idea. Another option could be to truncate the month variable (mon).

    It could be done inside the _str2time funtion, for example:

    def _str2time(day, mon, yr, hr, min, sec, tz):
        mon = mon[:3]  # assure 3 letters
        yr = int(yr)

    Anyway, I'll try to find why those long month names appear.

    Thank you

    tirkarthi commented 5 years ago

    No problem, I am also not an expert and I just skimmed through the RFC and cannot find any point related to month full name. So I just wanted to check if there are any recent changes I am missing or if the server is configured to set cookie expiration with full month name since there was no related issues raised as far as I have searched in the bug tracker. I will wait for others comment on this.

    Thanks

    7489fbbc-49c9-4ee0-8d47-aa791683f515 commented 5 years ago

    Yes, I was thinking that it could be a matter of configuration of the server (?).

    By the way, and just for fun, I've just realized that truncating mon at the begining of the _str2time funtion is a very bad idea because mon could also be an int.

    A better place is when looking the MONTHS_LOWER array index (and possible exception is handle): try: mon = MONTHS_LOWER.index(mon[:3].lower())+1

    (perhaps in 2 sentences for clarity)

    OK, waiting for experts' comments.

    I'm really enjoying Python.

    vadmium commented 5 years ago

    RFC 6265 says that only the first three letters of the month are significant, and the rest of the token should be ignored. See \https://tools.ietf.org/html/rfc6265#section-5.1.1\:

    month = ( "jan" / "feb" / "mar" / "apr" /
        "may" / "jun" / "jul" / "aug" /
        "sep" / "oct" / "nov" / "dec" ) *OCTET

    I have not heard of an Expires field syntax with a numeric month.

    65318757-63bd-443c-ac7a-623d387381dc commented 4 years ago

    Hello, I found this issue as most related to problem I was discovered: a long name of day doesn't parsed. According to https://tools.ietf.org/html/rfc2616#section-3.3.1:

      Sun, 06 Nov 1994 08:49:37 GMT  ; RFC 822, updated by RFC 1123
      Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036
      Sun Nov  6 08:49:37 1994       ; ANSI C's asctime() format

    HTTP/1.1 clients and servers that parse the date value MUST accept all three formats (for compatibility with HTTP/1.0), though they MUST only generate the RFC 1123 format for representing HTTP-date values in header fields.

    month format is correct, but for day part should be a both types.

    Thanks,