Pure Python strptime() (PEP 42)

brettcannon commented 23 years ago

BPO	474274
Nosy	@gvanrossum, @smontanaro, @warsaw, @brettcannon, @nascheme, @jaraco
Files	diff_time: diff for time importing _strptime against 2002-07-10 CVS _strptime.py: uploaded 2002-07-17 test_time.py: uploaded 2002-07-17 test_strptime.py: uploaded 2002-07-17 libtime_diff: uploaded 2002-07-17: patch for lib/libtime.tex

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = 'https://github.com/gvanrossum' closed_at = created_at = labels = ['library'] title = 'Pure Python strptime() (PEP 42)' updated_at = user = 'https://github.com/brettcannon' ``` bugs.python.org fields: ```python activity = actor = 'nnorwitz' assignee = 'gvanrossum' closed = True closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'brett.cannon' dependencies = [] files = ['3706', '3707', '3708', '3709', '3710'] hgrepos = [] issue_num = 474274 keywords = ['patch'] message_count = 39.0 messages = ['37953', '37954', '37955', '37956', '37957', '37958', '37959', '37960', '37961', '37962', '37963', '37964', '37965', '37966', '37967', '37968', '37969', '37970', '37971', '37972', '37973', '37974', '37975', '37976', '37977', '37978', '37979', '37980', '37981', '37982', '37983', '37984', '37985', '37986', '37987', '37988', '37989', '37990', '37991'] nosy_count = 7.0 nosy_names = ['gvanrossum', 'skip.montanaro', 'barry', 'nnorwitz', 'brett.cannon', 'nascheme', 'jaraco'] pr_nums = [] priority = 'normal' resolution = 'accepted' stage = None status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue474274' versions = ['Python 2.3'] ```

brettcannon commented 23 years ago

The attached file contains a pure Python version of strptime(). It attempts to operate as much like time.strptime() within reason. Where vagueness or obvious platform dependence existed, I tried to standardize and be reasonable.

PEP-42 makes a request for a portable, consistent version of time.strptime():

Add a portable implementation of time.strptime() that works in clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well as some other people who happened to have caught the post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036). It has been approved by the editors there and thus is listed as approved. It is also being considered for inclusion in the book (thanks, Alex, for encouraging this submission).

A PyUnit testing suite for the module is available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime along with the code for the function itself. Localization has been handled in a modular way using regexes. All of it is self-explanatory in the doc strings. It is very straight-forward to include your own localization settings or modify the two languages included in the module (English and Swedish).

If the code needs to have its license changed, I am quite happy to do it (I have already given the OK to the Python Cookbook).

-Brett Cannon

nascheme commented 22 years ago

Logged In: YES user_id=35752

I'm pretty sure this code needs a different license before it can be accepted. The current license contains the "BSD advertising clause". See http://www.gnu.org/philosophy/bsd.html.

brettcannon commented 22 years ago

Logged In: YES user_id=357491

Oops. I thought I had removed the clause. Feel free to remove it.

I am going to be cleaning up the module, though, so if you would rather not bother reviewing this version and wait on the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I get around to the update, or start a new patch?

nascheme commented 22 years ago

Logged In: YES user_id=35752

Go ahead and reuse this item. I'll wait for the updated version.

brettcannon commented 22 years ago

Logged In: YES user_id=357491

Version 2 of strptime() has now been uploaded. This nearly complete rewrite includes the removal of the need to input locale-specific time info. All need locale info is gleaned from time.strftime(). This makes it able to behave exactly like time.strptime().

d21744ff-f396-4c71-955e-7dbd2e886779 commented 22 years ago

Logged In: YES user_id=33168

Overall, the patch looks pretty good.
I didn't check for completeness or consistency, though.

You don't need: from exceptions import Exception
The comment "from strptime import * will only export strptime()" is not correct.
I'm not sure what should be included for the license.
Why do you need success flag in CheckIntegrity, you raise an exception? (You don't need to return anything, raise an exception, else it's ok)
In return_time(), could you change xrange(9) to range(len(temp_time)) this removes a dependancy.
strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY'
daylight should use the new bools True, False (this also applies to any other flags) * The formatting doesn't follow the standard (see PEP-8) (specifically, spaces after commas, =, binary ops, comparisons, etc)
Long lines should be broken up The test looks pretty good too. I didn't check it for completeness. The URL is wrong (too high up), the test can be found here: http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP-42 has a comment about a python strptime. So if this gets implemented, we need to update PEP-42. Thanks.

brettcannon commented 22 years ago

Logged In: YES user_id=357491

I'm afraid you looked at the wrong patch! My fault since I accidentally forgot to add a description for my patch. So the file with no description is the newest one and completely supercedes the older file. I am very sorry about that. Trust me, the new version is much better.

I realized the other day that since the time module is a C extension file, would getting this accepted require getting BDFL approval to add this as a separate module into the standard library? Would the time module have to have a Python interface module where this is put and all other methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that still apply to the new file:

strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' -> True, but I personally find it easier to read using the tuple. If it is standard practice in the standard library to do it the suggested way, I will change it.
daylight should use the new bools True, False (this also applies to any other flags) -> Oops. Since I wrote this under Python 2.2.1 I didn't think about it. I will go through the code and look for places where True and False should be used.

-Brett C.

smontanaro commented 22 years ago

Logged In: YES user_id=44345

Brett,

Please see the drastically shortened test_strptime.py. (Basically all I'm interested in here is whether or not strptime.strptime and time.strptime will pass the tests.) Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors. If parsetime == time.strptime I get

\====================================================================== ERROR: Test timezone directives. ----------------------------------------------------------------------

Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: ** Test %c directive. **\ ----------------------------------------------------------------------

Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

====================================================================== ERROR: Test for month directives. ----------------------------------------------------------------------

Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems? Got fixes for the strptime.strptime problems?

Thx,

Skip

brettcannon commented 22 years ago

Logged In: YES user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format bug (stupid typo on a variable name).

As for the %c directive, I pass that test. Can you please send the output of strftime and the time tuple used to generate it?

As for the time.strptime() failure, I don't have time.strptime() on any system available to me, so could you please send me the output you have for strftime('%Z'), and time.tzname?

I don't know how much %Z should be worried about since its use is deprecated (according to the time module's documentation). Perhaps strptime() should take the initiative and not support it?

-Brett

smontanaro commented 22 years ago

Logged In: YES user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'

brettcannon commented 22 years ago

Logged In: YES user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c. If you look at your output you will notice that the day of the month is '4', but if you look at the docs for time.strftime() you will notice that is specifies the day of the month (%d) as being in the range [01,31]. The regex for %d (simplified) is '(3[0-1])|([0-2]\d)'; not being represented by 2 digits caused the regex to fail.

Now the question becomes do we follow the spec and chaulk this up to a non-standard strftime() implementation, or do we adapt strptime to deal with possible improper output from strftime()? Changing the regexes should not be a big issue since I could just tack on '\d' as the last option for all numerical regexes.

As for the test error from time.strptime(), I don't know what is causing it. If you look at the test you will notice that all it basically does is parsetime(time.strftime("%Z"), "%Z"). Now how that can fail I don't know. The docs do say that strptime() tends to be buggy, so perhaps this is a case of this.

One last thing. Should I wait until the bugs are worked out before I post to python-dev asking to either add this as a module to the standard library or change time to a Python stub and rename timemodule.c? Should I ask now to get the ball rolling? Since I just joined python-dev literally this morning I don't know what the protocol is.

d21744ff-f396-4c71-955e-7dbd2e886779 commented 22 years ago

Logged In: YES user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re: 'yY' vs. ('y', 'Y')), I'm not sure people really care. It's not big to me. Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error (in __init__) you could change 'sequence of ... must be X items long' to '... must have/contain X items'.

Generally, it would be nice to make sure none of the lines are over 72-79 chars (see PEP-8).

Instead of doing: newlist = list(orig) newlist.append('') x = tuple(newlist)

you could do: x = tuple(orig[:]) or something like that. Perhaps a helper function?

In __init do you want to check the params against 'is None' If someone passes a non-sequence that doesn't evaluate to False, the __init won't raise a TypeError which it probably should.

What is the magic date used in __calcweekday()? (1999/3/15+ 22:44:55) is this significant, should there be a comment? (magic dates are used elsewhere too, e.g., \_calcmonth, \_calc_am_pm, many more)

__calcmonth() doesn't seem to take leap year into account? (not sure if this is a problem or not) In \_calc_date_time(), you use date_time[offset] repetatively, couldn't you start the loop with something like dto = date_time[offset] and then use dto (dto is not a good name, I'm just making an example)

Are you supposed to use __init when deriving from built-ins (TimeRE(dict)) or __new? (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you can do: return cmp(b_length, a_length)

Note: you can do x = y = z = -1, instead of x = -1 ; y = -1 ; z = -1

It could be problematic to compare x is -1. You should probably just use ==. It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS were not defined in Objects/intobject.c.

This docstring seems backwards: def gregToJulian(year, month, day): """Calculate the Gregorian date from the Julian date.""" I know a lot of these things seem like a pain. And it's not that bad now, but the problem is maintaining the code. It will be easier for everyone else if the code is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

brettcannon commented 22 years ago

Logged In: YES user_id=357491

I went ahead an implemented most of Neal's suggestions. On a few, of them, though, I either didn't do it or took a slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'. If it gives a performance boost, why not since it doesn't make the code harder to read. Implementing it actually had me catch some redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked to check that they are either None or have the proper length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that could specify front or back concatination. Not much different from before, but it allows me to specify front or back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year. Since it is not validating the data string for validity the fxn just takes the data and uses it. I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1. Now it barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is not. Didn't know about that dependency on NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards. Thanks for catching that, Neal.

I also went through and added True and False where appropriate. There is a line in the code where True = 1; False = 0 right at the top. That can obviously be removed if being run under Python 2.3.

And I completely understand being picky about minute details where maintainability is a concern. I just graduated from Cal and so the memory of seeing beginning programmers' code is still fresh in my mind \<shudders>.

And I will query python-dev about how to go about to get this added after the bugs are fixed and I am back home (going to be out of town until June 16). I will still be periodically checking email, though, so I will continue to implement any suggestions/bugfixes that anyone suggests/finds.

brettcannon commented 22 years ago

Logged In: YES user_id=357491

I am back from my vacation and ready to email python- dev about getting this patch accepted (whether to modify time or make this a separate module, etc.). I think I will do the email on June 17.

Before then, though, I am going to make two changes.
One is the raise a Value Error exception if the regex doesn't match (to try to match time.strptime()s exception as seen in Skip's run of the unit test). The other change is to tack on a \d on all numeric formats where it might come out as a single digit (i.e., lacking a leading zero). This will be for v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on this, please let me know! I would like to have this code as done as possible before I make any announcement.

brettcannon commented 22 years ago

Logged In: YES user_id=357491

I uploaded v.2.0.3. Beyond implementing what I mentioned previously (raising TypeError when a match fails, adding \d to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex. I discovered that ANSI C for ctime displays the month with a leading space if it is a single digit. So to deal with that since at least Skip's C library likes to use that format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists. A recent mail on python-dev from GvR said that lists are for homogeneous data, which everything that is grouped together in LocaleTime is. It also simplified the code slightly and led to less conversions of data types.

I also added a method that raises a TypeError if you try to assign to any of LocaleTime's attributes. I thought that if you left out the set value for property() it wouldn't work; didn't realize it just defaults over to __setitem__. So I added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False without defining them. Obviously just set those values to 1 and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I have for testing my code. It is not black-box testing, though; Skip's pruned version of my testing suite fits that bill (I think).

brettcannon commented 22 years ago

Logged In: YES user_id=357491

I have uploaded v. 2.0.4. It now uses the calendar module to figure out the names of weekdays and months. Thanks goes out to Guido for pointing out this undocumented feature of calendar.

brettcannon commented 22 years ago

Logged In: YES user_id=357491

2.1.0 is now up and ready for use. I only changed two things to the code, but since they change the semantics of stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime object. I did this for two reasons. One is because I forgot about how default arguments are created at the time of function creation and not at each fxn call. This meant that if someone was not thinking and ran strptime() under one locale and then switched to another locale without explicitly passing in a new LocaleTime object for every call for the new locale, they would get bad matches. That is not good.

The other reason was that I don't want to force users to pass in a LocaleTime object on every call if I can't have a default value for it. This is meant to act as a drop-in replacement for time.strptime(). That forced the removal of the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log files in other languages other then there default locale. And if they were, they should change the locale for the interpreter and not just for strptime().

The second change was what triggers strptime() to return an re object that it can use. Initially it was any nothing value (i.e., would be considered false), but I realized that an empty string could trigger that and it would be better to raise a TypeError then let some error come up from trying to use the re object in an incorrect way.

Now, to have an re object returned, you pass in False. I figured that there is a very minimal chance of passing in False when you meant to pass in a string. Also, False as the data_string, to me, means that I don't want what would normally be returned.

I debated about removing this feature from strptime(), but I profiled it and most of the time comes from TimeRE's __getitem__. So building the string to be compiled into a regex is the big bottleneck. Using a precompiled regex instead of constructing a new one everytime took 25% of the time overall for strptime() when calling strptime() 10,000 times in a row. This is a conservative number, IMO, for calls in a row; I checked the Apache hit logs for a single day on Open Computing Facility's web server (http://www.ocf.berkeley.edu/) and there were 188,562 hits on June 16 alone. So I am going to keep the feature until someone tells me otherwise.

brettcannon commented 22 years ago

Logged In: YES user_id=357491

I have uploaded a contextual diff of timemodule.c with a callout to strptime.strptime when HAVE_STRPTIME is not defined just as Guido requested.

It's my first extension module, so I am not totally sure of myself with it. But since Alex Marttelli told me what I needed to do I am fairly certain it is correct.

brettcannon commented 22 years ago

Logged In: YES user_id=357491

2.1.1 is now uploaded. Almost a purely syntatical change. From discussions on python-dev I renamed the helper fxns so they are all lowercase-style. Also changed them so that they state what the fxn returns.

I also put all of the imports on their own line as per PEP-8.

The only semantical change I did was directly import re.compile since it is the only thing I am using from the re module.

These changes required tweaking of my exhaustive testing suite, so that got uploaded, too.

brettcannon commented 22 years ago

Logged In: YES user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down below!). Just a little bit more cleanup. Biggest change is that I changed the default format string and made strptime() raise ValueError instead of TypeError. This was all done to match the time module docs.

I also fiddled with the regexes so that the groups were none-capturing. Mainly done for a possible performance improvement.

brettcannon commented 22 years ago

Logged In: YES user_id=357491

The actual 2.1.3 edition of strptime is now up. I don't think there are any changes, but since I renamed the file _strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module taken from CVS on 2002-07-10. The only difference between this and the previous diff (which was against 2.2.1's time module) is the change of the imported module to _strptime.

gvanrossum commented 22 years ago

Logged In: YES user_id=6380

Hm. This isn't done yet. I get these problems:

(a) the patch for timemodule.c doesn't apply cleanly in current CVS (trivial)

(b) it still tries to import strptime (no leading '_') (also trivial)

(c) so does test_strptime.py (also trivial)

(d) the simplest of simple examples fails:

With Linux's strptime:

>>> time.strptime("7/12/02", "%m/%d/%y")
(2002, 7, 12, 0, 0, 0, 4, 193, 0)
>>>

With yours:

>>> time.strptime("7/12/02", "%m/%d/%y")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/guido/python/dist/src/Lib/_strptime.py", line
392, in strptime
    raise ValueError("time data did not match format")
ValueError: time data did not match format
>>>

Perhaps you should write a regression test suite for the strptime function as found in the time module courtesy of libc, and then make sure that your code satisfies it?

brettcannon commented 22 years ago

Logged In: YES user_id=357491

To respond to your points, Guido:

(a) I accidentally uploaded the old file. Sorry about that. I misnamed the new one 'time_diff" but in my head I meant to overwrite "diff_time". I have uploaded the new one.

(b) See (a)

(c) Oops. That is a complete oversight on my part. Now in (d) you mention writing up regression tests for the standard time.strptime. I am quite hapy to do this. Do you want that as a separate patch? If so I will just stop with uploading tests here and just start a patch with my strptime tests for the stdlib tests.

(d) The reason this test failed is because your input is not compliant with the Python docs. Read what %m accepts:

Month as a decimal number [01,12]

Notice the leading 0 for the single digit month. My implementation follows the docs and not what glibc suggests. If you want, I can obviously add on to all the regexes \d as an option and eliminate this issue. But that means it will no longer be following the docs. This tripped Skip up too since no one writes numbers that way; strftime does, though. Now if the docs meant for no trailing 0, I think they should be rewritten since that is misleading.

In other words, either strptime stays as it is and follows the docs or I change the regexes, but then the docs will have to be changed. I can go either way, but I personally would want to follow the docs as-is since strptime is meant to parse strftime output and not human output. =)

gvanrossum commented 22 years ago

Logged In: YES user_id=6380

Hm, the new diff_time *still* fails to apply. But don't worry about that.

I'd love to see regression tests for time.strptime. Please upload them here -- don't start a new patch.

I think your interpretation of the docs is overly restrictive; the table shows what strftime does but I think it's reasonable for strptime to accept missing leading zeros. You can upload a patch for the docs too if you feel that's necessary. You may also try to read up on what the XPG standard says about strptime.

brettcannon commented 22 years ago

Logged In: YES user_id=357491

Uploaded 2.1.4. I added \d to the end of all relevant regexes (basically all of them but %y and %Y) to deal with non-zero-leading numbers.

I also made the regex case-insensitive.

As for the diff failing, I am wondering if I am doing something wrong. I am just running diff -c CVS_file modified_file > diff_file . Isn't that right?

I will work on merging my strptime tests into the time regression tests and upload a patch here.

I will do a patch for the docs since it is not consistent with the explanation of struct_time (or at least in my opinion).

I tried finding XPG docs, but the best Google came up with was the NetBSD man pages for strptime (which they claim is XPG compliant). The difference between that implementation and mine is that NetBSD's allows whitespace (defined as isspace()) in the format string to match \s* in the data string. It also requires a whitespace or a non-alphanumeric character while my implementation does not require that.

Personally, I don't like either difference. If they were used, though, there might be a possibility of rewriting strptime to just use a bunch of string methods instead of regexes for a possible performance benefit. But I prefer regexes since it adds checks of the input. That and I just like regexes period. =)

Also, I noticed that your little test returned 0 for all unknown values. Mine returns -1 since 0 can be a legitimate value for some and I figured that would eliminate ambiguity. I can change it to 0, though.

brettcannon commented 22 years ago

Logged In: YES user_id=357491

Two things have been uploaded. First, test_time.py w/ a strptime test. It is almost an exact mirror of the strftime test; only difference is that I used strftime to test strptime. So if strftime ever fails, strptime will fail also. I feel this is fine since strptime depends on strftime so much that if strftime were to fail strptime would definitely fail.

The other file is version 2.1.5 of strptime. I made two changes. One was to remove the TypeError raised when %I was used without %p. This was from me being very picky about only accepting good data strings. The second was to go through and replace all whitespace in the format string with \s*. That basically makes this version of strptime XPG compatible as far as I (and the NetBSD man page) can tell. The only difference now is that I do not require whitespace or a non-alphanumeric character between format strings. Seems like a ridiculous requirement since the requirement that whitespace be able to compress down to no whitespace negates this requirement. Oh well, we are more than compliant now.

I decided not to write a patch for the docs to make them read more leniently for what the format directives. Figured I would just let people who think like me do it in a more "proper" way with leading zeros and those who don't read it like that to still be okay.

I think that is everything. If you want more in-depth tests, Guido, I can add them to the testing suite, but I figured that since this is (hopefully) going in bug-free it needs only be checked to make sure it isn't broken by anything. And if you do want more in-depth tests, do you want me to add mirror tests for strftime or not worry about that since that is the ANSI C library's problem? Other then that, I think strptime is pretty much done.

gvanrossum commented 22 years ago

Logged In: YES user_id=6380

Can you please delete all the obsolete uploads? (If SF won't let you, let me know and I'll do it for you, leaving only the most recend version of each.)
There' still a confusion between strptime.py and _strptime.py; your test_time.py imports strptime, and so does the latest version of test_strptime.py I can find.
The "from __future__ import division" is unnecessary, since you're never using the single / operator (// doesn't need the future statement). Also note that future statements should come *after* a module's docstring (for future reference :-).
When I run test_strptime.py, I get one failure:

\====================================================================== FAIL: Test TimeRE.pattern. ----------------------------------------------------------------------

Traceback (most recent call last):
  File "../Lib/test/test_strptime.py", line 124, in test_pattern

self.failUnless(pattern_string.find("(?P<d>(3[0-1])|([0-2]\d)|\d|(
\d))") != -1, "did not find 'd' directive pattern string
'%s'" % pattern_string)
  File "/home/guido/python/dist/src/Lib/unittest.py", line
262, in failUnless
    if not expr: raise self.failureException, msg
AssertionError: did not find 'd' directive pattern string
'(?P<a>(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P<A>(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P<d>3[0-1]|[0-2]\d|\d|
\d)'

I haven't looked into this deeper.

Back to you...

brettcannon commented 22 years ago

Logged In: YES user_id=357491

God I wish I could delete those old files! Poor Neal Norwitz was nice enough to go over my code once to help me make it sure it was up for being included in the stdlib, but he initially used an old version. Thankfully he was nice enough to look over the newer version at the time. But no, SF does not give me the priveleges to delete old files (and why is that? I am the creator of the patch; you would think I could manage my own files). I re-uploaded everything now. All files that specify they were uploaded 2002-07-17 are the newest files.

I am terribly sorry about this whole name mix-up. I have now fixed test_strptime.py to use _strptime. I completely removed the strptime import so that the strptime testing will go through time and thus test which ever version time will export.

I removed the __future import. And thanks for the piece of advice; I was taking the advice that __future statements should come before code a little too far. =)

As for your error, that is because the test_strptime.py you are using is old. I originally had a test in there that checked to make sure the regex returned was the same as the one being tested for; that was a bad decision. So I went through and removed all hard-coded tests like that. Unfortunately the version you ran still had that test in there. SF should really let patch creators delete old files.

That's it this time. Now I await the next drama in this never-ending saga of trying to make a non-trivial contribution to Python. =)

gvanrossum commented 22 years ago

Logged In: YES user_id=6380

OK, deleting all old files as promised. All tests succeed. I think I'll check this version in (but it may be tomorrow, since I've got a few other things to take care of).

brettcannon commented 22 years ago

Logged In: YES user_id=357491

Wonderful!

About the docs; do you want me to email Fred or upload a patched version of the docs for time fixed? And for removing the request in PEP-42, should I email Jeremy about it or Barry?

brettcannon commented 22 years ago

Logged In: YES user_id=357491

Since I had the time, I went ahead and did a patch for libtime.tex that removes the comment saying that strptime fully relies on the C library and uploaded it.

d21744ff-f396-4c71-955e-7dbd2e886779 commented 22 years ago

Logged In: YES user_id=33168

Brett, I'm still following. It wasn't that bad. :-) Guido, let me know if you want me to do anything/check stuff in.

Docs are fine to upload here. I can change PEP-42 also.

gvanrossum commented 22 years ago

Logged In: YES user_id=6380

Thanks! All checked in.

jaraco commented 22 years ago

Logged In: YES user_id=599869

_strptime still doesn't handle case correctly. Although it matches case-insensitive, the lookups for month names are still case sensitive (since they must match exactly the month names returned by time.strftime). Allow me to demonstrate below.

>>> import _strptime as strptime
>>> strptime.strptime( 'Aug', '%b' )
(-1, 8, -1, -1, -1, -1, -1, -1, -1)
>>> strptime.strptime( 'AUG', '%b' )
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "C:\Program Files\Python\Lib\site-
packages\strptime.py", line 410, in strptime
    month = locale_time.a_month.index(found_dict['b'])
ValueError: list.index(x): x not in list

The same result is encountered using any variation of 'aug' other than 'Aug'.

I imagine the solution to this problem is as simple as changing lines 408, 410, 430, and 432 to something like: month = locale_time.a_month.index(capwords(found_dict ['b'])) and adding from string import capwords to the header of the file. This is what I did in strptime v2.0.0 and the latest v2.1.5, and it has worked well.

As a general note, I believe the test suite should also test for cases not automatically generated. Many of the parse routines are going to be parsing times generated by systems other than time.strftime, to include variants such as the aforementioned case variant and other things like miscellaneous interspersed strings.

I will only suggest these changes, not implement them myself on this page, as I'm not intimately familiar with sourceforge or code customs, so I'll allow someone such as Guido or Brett to make the changes.

Regards, Jason

gvanrossum commented 22 years ago

Logged In: YES user_id=6380

Reopening to address Jason's bug report.

Brett, can you provide a patch?

brettcannon commented 22 years ago

Logged In: YES user_id=357491

Yep, I will patch it. I will add it to patch 593560 which has my last bugfix and code cleanup in it.

brettcannon commented 22 years ago

Logged In: YES user_id=357491

OK, all patched and attached to the other patch entry. I also discovered a bug in PyUnit and got to submit a patch for that to its patch DB. Such fun. =)

warsaw commented 22 years ago

Logged In: YES user_id=12800

Closing. Refer to patch 593560 for further details.

d21744ff-f396-4c71-955e-7dbd2e886779 commented 22 years ago

Logged In: YES user_id=33168

I'm assuming Barry really meant to keep this closed. :-)

python / cpython

Pure Python strptime() (PEP 42) #35391