Open 7489fbbc-49c9-4ee0-8d47-aa791683f515 opened 5 years ago
http.cookiejar (cookielib, for python2.*) does not parse some cookies' Expires date.
For example: "Friday, 1-August-1997 00:00:00 GMT" does not work (while: "Fri, 01 Aug 1997 00:00:00 GMT" works fine)
This is basically due to long names of months (it is compared with MONTHS_LOWER: list of 3-letter months). So, I propose a small change in the definition of LOOSE_HTTP_DATE_RE (see fifth line):
LOOSE_HTTP_DATE_RE = re.compile(
r"""^
(\d\d?) # day
(?:\s+|[-\/])
(\w{3})\w* # month (3 first letters only)
...
Instead of: LOOSE_HTTP_DATE_RE = re.compile( r"""^ (\d\d?) # day (?:\s+|[-\/]) (\w+) # month ...
I've tested only http.cookiejar (python 3.6), but I suposse the same change will work on cookielib
Thanks in advance
Thanks for the report. As far as I can see from the RFC month seems to follow three letter code. Is there a part of RFC where Python is not compliant? I can't find any related issues or RFC links allowing month format specified in the report. Can you please add the relevant part of RFC or links if any?
Date RFC 6265 5.1.1 : https://tools.ietf.org/html/rfc6265.html#section-5.1.1
Thanks for your answer. I have not found any RFCs with full month names either. I'm afraid I'm not an expert here.
But the case is that I get them in my work. Here is an example of response header:
HTTP/1.1 200 OK Server: Oracle-iPlanet-Web-Server/7.0 Date: Tue, 10 Oct 2018 14:29:44 GMT Version-auth-credencial: v.3.0.1 Iplanet - Sun Solaris - Contexto Multiple Set-cookie: JSESSIONIDE=Del; expires=Friday, 1-August-1997 00:00:00 GMT; domain=...
I do not know if it's an old date format (?)... or if it is a quite rare case...
I have created some previous bash scripts using wget and they work fine, but I have had problems with python3 (and requests module) till I realized this issue. And it was not very easy: I am very new with python :(
That's the reason of my proposal. It's just to be coherent: if we compare 3 letters of a month with MONTHS_LOWER, let's use just 3 (first) letters.
Perhaps modifying LOOSE_HTTP_DATE_RE is not a good idea. Another option could be to truncate the month variable (mon).
It could be done inside the _str2time funtion, for example:
def _str2time(day, mon, yr, hr, min, sec, tz):
mon = mon[:3] # assure 3 letters
yr = int(yr)
Anyway, I'll try to find why those long month names appear.
Thank you
No problem, I am also not an expert and I just skimmed through the RFC and cannot find any point related to month full name. So I just wanted to check if there are any recent changes I am missing or if the server is configured to set cookie expiration with full month name since there was no related issues raised as far as I have searched in the bug tracker. I will wait for others comment on this.
Thanks
Yes, I was thinking that it could be a matter of configuration of the server (?).
By the way, and just for fun, I've just realized that truncating mon at the begining of the _str2time funtion is a very bad idea because mon could also be an int.
A better place is when looking the MONTHS_LOWER array index (and possible exception is handle): try: mon = MONTHS_LOWER.index(mon[:3].lower())+1
(perhaps in 2 sentences for clarity)
OK, waiting for experts' comments.
I'm really enjoying Python.
RFC 6265 says that only the first three letters of the month are significant, and the rest of the token should be ignored. See \https://tools.ietf.org/html/rfc6265#section-5.1.1\:
month = ( "jan" / "feb" / "mar" / "apr" /
"may" / "jun" / "jul" / "aug" /
"sep" / "oct" / "nov" / "dec" ) *OCTET
I have not heard of an Expires field syntax with a numeric month.
Hello, I found this issue as most related to problem I was discovered: a long name of day doesn't parsed. According to https://tools.ietf.org/html/rfc2616#section-3.3.1:
Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123
Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036
Sun Nov 6 08:49:37 1994 ; ANSI C's asctime() format
HTTP/1.1 clients and servers that parse the date value MUST accept all three formats (for compatibility with HTTP/1.0), though they MUST only generate the RFC 1123 format for representing HTTP-date values in header fields.
month format is correct, but for day part should be a both types.
Thanks,
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-bug', 'library']
title = "cookielib/cookiejar cookies' Expires date parse fails with long month names"
updated_at =
user = 'https://bugs.python.org/albmoral'
```
bugs.python.org fields:
```python
activity =
actor = 'lpopil'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'alb_moral'
dependencies = []
files = []
hgrepos = []
issue_num = 34951
keywords = ['patch']
message_count = 7.0
messages = ['327461', '327475', '327482', '327484', '327486', '327491', '365882']
nosy_count = 4.0
nosy_names = ['martin.panter', 'xtreak', 'alb_moral', 'lpopil']
pr_nums = ['19393']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue34951'
versions = ['Python 2.7', 'Python 3.6']
```