pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.51k stars 17.88k forks source link

timedelta string conversion requires two-digit hour value #9570

Closed sammosummo closed 9 years ago

sammosummo commented 9 years ago

Timedelta('00:00:00') works fine whereas Timedelta('0:00:00') raises and error. Unsure whether to call this a bug, but under some circumstances the datetime module in pure python will produce time delta strings without the leading 0.

jreback commented 9 years ago

xref #8863

this would be a relatively simple fix to adjust the regex to parse this format.

pull-requests are welcome!

chrisgilmerproj commented 9 years ago

Hey, I'm gonna try to work on this.

chrisgilmerproj commented 9 years ago

Here's me recreating the issue:

In [1]: from pandas import Timedelta

In [2]: Timedelta('0:00:00')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-d6c00d815c30> in <module>()
----> 1 Timedelta('0:00:00')

/Users/cgilmer/Projects/pandas-cgilmer/pandas/tslib.pyx in pandas.tslib.Timedelta.__new__ (pandas/tslib.c:30144)()
   1752         elif util.is_string_object(value):
   1753             from pandas import to_timedelta
-> 1754             value = to_timedelta(value,unit=unit,box=False)
   1755         elif isinstance(value, timedelta):
   1756             value = convert_to_timedelta64(value,'ns',False)

/Users/cgilmer/Projects/pandas-cgilmer/pandas/tseries/timedeltas.pyc in to_timedelta(arg, unit, box, coerce)
     70 
     71     # ...so it must be a scalar value. Return scalar.
---> 72     return _coerce_scalar_to_timedelta_type(arg, unit=unit, box=box, coerce=coerce)
     73 
     74 _unit_map = {

/Users/cgilmer/Projects/pandas-cgilmer/pandas/tseries/timedeltas.pyc in _coerce_scalar_to_timedelta_type(r, unit, box, coerce)
    144 
    145         # we are already converting to nanoseconds
--> 146         converter = _get_string_converter(r, unit=unit)
    147         r = converter()
    148         unit='ns'

/Users/cgilmer/Projects/pandas-cgilmer/pandas/tseries/timedeltas.pyc in _get_string_converter(r, unit)
    263 
    264     # no converter
--> 265     raise ValueError("cannot create timedelta string converter for [{0}]".format(r))
    266 

ValueError: cannot create timedelta string converter for [0:00:00]
chrisgilmerproj commented 9 years ago

Here's my attempted fix:

diff --git a/pandas/tseries/timedeltas.py b/pandas/tseries/timedeltas.py
index 91e75da..7333d0b 100644
--- a/pandas/tseries/timedeltas.py
+++ b/pandas/tseries/timedeltas.py
@@ -119,7 +119,7 @@ def _validate_timedelta_unit(arg):
 _short_search = re.compile(
     "^\s*(?P<neg>-?)\s*(?P<value>\d*\.?\d*)\s*(?P<unit>d|s|ms|us|ns)?\s*$",re.IGNORECASE)
 _full_search = re.compile(
-    "^\s*(?P<neg>-?)\s*(?P<days>\d*\.?\d*)?\s*(days|d|day)?,?\s*\+?(?P<time>\d{2}:\d{2}:\d{2})?(?P<frac>\.\d+)?\s*$",re.IGNORECASE)
+    "^\s*(?P<neg>-?)\s*(?P<days>\d*\.?\d*)?\s*(days|d|day)?,?\s*\+?(?P<time>\d{1,2}:\d{2}:\d{2})?(?P<frac>\.\d+)?\s*$",re.IGNORECASE)
 _nat_search = re.compile(
     "^\s*(nat|nan)\s*$",re.IGNORECASE)
 _whitespace = re.compile('^\s*$')
chrisgilmerproj commented 9 years ago

Corrected output:

In [1]: from pandas import Timedelta

In [2]: Timedelta('0:00:00')
Out[2]: Timedelta('0 days 00:00:00')
chrisgilmerproj commented 9 years ago

@jreback - My PR is ready for you

jreback commented 9 years ago

closed by #9868