Closed geoffroy-destaintot closed 5 years ago
thought we had an issue for this....
its an wraparound thing I think.
PR's are welcome.
Any pointers on how to fix this?
step thru the code - this hits cython at some point (for the add) then again for the construction of a new Timestamp - think it's crashing there
I generated the stack trace, and stepped through the code. I've isolated the problem to the subset of the trace I've attached. It crashes at the point where it's trying to multiply "self.n" and "self._inc", within the Delta function of the Tick class. Any suggestions on fixing this?
`> /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(393)radd() -> def radd(self, other): (Pdb) s
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(394)radd() -> return self.add(other) (Pdb) s --Call-- /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2698)add() -> def add(self, other): (Pdb) s /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2699)add() -> if isinstance(other, Tick): (Pdb) s /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2704)add() -> elif isinstance(other, ABCPeriod): (Pdb) s --Call-- /home/bhaprayan/Workspace/pandas/pandas/types/generic.py(7)_check() -> @classmethod (Pdb) s /home/bhaprayan/Workspace/pandas/pandas/types/generic.py(9)_check() -> return getattr(inst, attr, '_typ') in comp (Pdb) s --Return-- /home/bhaprayan/Workspace/pandas/pandas/types/generic.py(9)_check()->False -> return getattr(inst, attr, '_typ') in comp (Pdb) s /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2706)add() -> try: (Pdb) s /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2707)add() -> return self.apply(other) (Pdb) s --Call-- /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2746)apply() -> def apply(self, other): (Pdb) s /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2748)apply() -> if isinstance(other, (datetime, np.datetime64, date)): (Pdb) s /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2749)apply() -> return as_timestamp(other) + self (Pdb) s --Call-- /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(35)as_timestamp() -> def as_timestamp(obj): (Pdb) s /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(36)as_timestamp() -> if isinstance(obj, Timestamp): (Pdb) s /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(37)as_timestamp() -> return obj (Pdb) s --Return-- /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(37)as_timestamp()->Timestam...0:00:00') -> return obj (Pdb) s --Call-- /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2738)delta() -> @property (Pdb) s /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2740)delta() -> return self.n * self._inc (Pdb) s OverflowError: 'Python int too large to convert to C long' /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2740)delta() -> return self.n * self._inc (Pdb) s --Return-- /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2740)delta()->None -> return self.n * self._inc (Pdb) s --Call-- /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(393)radd() -> def radd(self, other): (Pdb) `
so I think that multiplcation needs a guard on overflow
In [2]: np.iinfo(np.int64).max
Out[2]: 9223372036854775807
In [3]: np.int64(1000000)*np.int64(86400*1e9)
/Users/jreback/miniconda/bin/ipython:1: RuntimeWarning: overflow encountered in long_scalars
#!/bin/bash /Users/jreback/miniconda/bin/python.app
Out[3]: -5833720368547758080
First, I set a guard on the multiplication overflow. However it's still stuck in a recursive loop, where after catching the OverflowError, it still calls radd.
`ipdb> s
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2741)delta() 2739 def delta(self): 2740 try: -> 2741 self.n * self._inc 2742 except OverflowError: 2743 raise
ipdb> s OverflowError: 'Python int too large to convert to C long'
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2741)delta() 2739 def delta(self): 2740 try: -> 2741 self.n * self._inc 2742 except OverflowError: 2743 raise
ipdb> s
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2742)delta() 2740 try: 2741 self.n * self._inc -> 2742 except OverflowError: 2743 raise 2744
ipdb> s
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2743)delta() 2741 self.n * self._inc 2742 except OverflowError: -> 2743 raise 2744 2745 @property
ipdb> s --Return-- None
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2743)delta() 2741 self.n * self._inc 2742 except OverflowError: -> 2743 raise 2744 2745 @property
ipdb> s --Call--
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(393)radd() 391 return NotImplemented 392 --> 393 def radd(self, other): 394 return self.add(other) 395
ipdb> s
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(394)radd() 392 393 def radd(self, other): --> 394 return self.add(other) 395 396 def sub(self, other): `
Looks like this issue was already solved, by running the reproduction scenario now I get a clear exception:
OverflowError: the add operation between <100000000000000000000000000000000000000000000000000 * Days> and 2000-01-01 00:00:00 will overflow
great
do u want to do a PR with some tests ?
I put together a quick smoke test, and indeed it looks like things are generating exceptions like they should.
But two offsets, the FY5253Quarter and DateOffset cases, both take forever to fail, ~20s in one case, ~10s in the other, so something's different about them (I haven't given even a cursory glance).
this is already fixed in master if someone would like to add tests in a PR
Code Sample, a copy-pastable example if possible
In:
Out:
=> python crash
Fatal Python error: Cannot recover from stack overflow.
Current thread 0x00002b00 (most recent call first): File "C:\Users\geoffroy.destaintot\Miniconda3\envs\pd-0.18\lib\site-packages\pandas\tseries\offsets.py", line 2526 in delta File "C:\Users\geoffroy.destaintot\Miniconda3\envs\pd-0.18\lib\site-packages\pandas\tseries\offsets.py", line 2535 in apply File "C:\Users\geoffroy.destaintot\Miniconda3\envs\pd-0.18\lib\site-packages\pandas\tseries\offsets.py", line 2493 in add File "C:\Users\geoffroy.destaintot\Miniconda3\envs\pd-0.18\lib\site-packages\pandas\tseries\offsets.py", line 390 in radd File "C:\Users\geoffroy.destaintot\Miniconda3\envs\pd-0.18\lib\site-packages\pandas\tseries\offsets.py", line 2535 in apply File "C:\Users\geoffroy.destaintot\Miniconda3\envs\pd-0.18\lib\site-packages\pandas\tseries\offsets.py", line 2493 in add File "C:\Users\geoffroy.destaintot\Miniconda3\envs\pd-0.18\lib\site-packages\pandas\tseries\offsets.py", line 390 in radd ...
Expected Output
Satisfactory behaviour when using python timedeltas:
In:
Out:
=> python error
Traceback (most recent call last): File "C:/Users/geoffroy.destaintot/Documents/Local/Informatique/Projets/2016-08-django-debug/to_offset_bug.py", line 11, in
d + dt.timedelta(days=1)100*25
OverflowError: Python int too large to convert to C long
output of
pd.show_versions()
(same behaviour with pandas 0.17.1, 0.16.2, 0.15.2)
INSTALLED VERSIONS
commit: None python: 3.5.2.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: None
pandas: 0.18.1 nose: None pip: 8.1.2 setuptools: 25.1.6 Cython: None numpy: 1.11.1 scipy: None statsmodels: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.5.3 pytz: 2016.6.1 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None boto: None pandas_datareader: None