thouis / numpy-trac-migration

numpy Trac to github issues migration
2 stars 3 forks source link

Numpy mrecords - unpickle causes unhandled win32 exception (migrated from Trac #897) #1504

Open thouis opened 12 years ago

thouis commented 12 years ago

Original ticket http://projects.scipy.org/numpy/ticket/897 Reported 2008-08-29 by trac user chrisshucksmith, assigned to atmention:pierregm.

Putting a datetime object into a masked record array (causing one or more entries in the dtype to be '|O4') seems to trigger a bug in unpickling, resulting in either a TypeError "object pickle not returning list" or sudden-death python.exe crash.

'''Crash:''' import os from pickle import Pickler, Unpickler, HIGHEST_PROTOCOL from StringIO import StringIO from numpy import array, ones, zeros from numpy.ma import mrecords from datetime import datetime

cache_dir = os.curdir

rows = []
rowmasks = []
for i in xrange(5):    
    x = datetime(2008,6,5)
    rows.append([x,x,x,x,x])
    mask = zeros([5], 'bool')
    mask[2] = True
    rowmasks.append(mask)

print 'data:\n', rows
print 'Mask:\n', rowmasks 

recarr = mrecords.fromrecords(rows, names=['Ones','Twos', 'Threes', 'Fours', 'Fives'], mask=rowmasks)

print 'Records;'
print recarr.dtype
print recarr[0]
print recarr.Twos
print recarr.Threes

print 'Picking'
sio = StringIO()
p = Pickler(sio, HIGHEST_PROTOCOL)
p.dump(recarr)
print 'Unpickling'
sio.seek(0)
u = Unpickler(sio)
recarr2 = u.load()
print recarr2[0]

'''Exception case:'''

Traceback (most recent call last):
  File "C:\workspace\eclipse\hacking\src\pycrash.py", line 33, in <module>
    recarr2 = u.load()
  File "c:\python25\lib\pickle.py", line 858, in load
    dispatch[key](self)
  File "c:\python25\lib\pickle.py", line 1217, in load_build
    setstate(state)
  File "C:\Python25\Lib\site-packages\numpy\ma\mrecords.py", line 520, in __setstate__
    ndarray.__setstate__(self, (shp, typ, isf, raw))
TypeError: object pickle not returning list

Code import os from pickle import Pickler, Unpickler, HIGHEST_PROTOCOL from StringIO import StringIO from numpy import array, ones, zeros from numpy.ma import mrecords from datetime import datetime

cache_dir = os.curdir

rows = []
rowmasks = []
for i in xrange(6):    
    rows.append( [ datetime(2008,6,5) ] )        
    rowmasks.append( zeros([1], 'bool'))

print 'data:\n', rows
print 'Mask:\n', rowmasks 

recarr = mrecords.fromrecords(rows, names=['Dates'], mask=rowmasks)

print 'Records;'
print recarr.dtype
print recarr[0]
print recarr.Dates

print 'Picking'
sio = StringIO()
p = Pickler(sio, HIGHEST_PROTOCOL)
p.dump(recarr)
print 'Unpickling'
sio.seek(0)
u = Unpickler(sio)
recarr2 = u.load()
print recarr2[0]
print recarr2[1]

I'm aware neither of these cases represent good code, the datetime should concisely be stored in a numerical representation. Tested with python 2.5 and various Numpy versions including numpy-1.1.1-py2.5.egg

thouis commented 12 years ago

Comment in Trac by atmention:pierregm, 2009-01-07

OK, there's indeed something broken here, but I can't fix it for the moment. Any array with some np.object in the dtype will fail, no matter how simple the dtype is. Looks like the corresponding data should be exported as a list instead of a string in reduce. Fixing this could be problematic if there are several fields.

There's a workaround of some sorts, however:

  1. use the .torecords (now called .toflex) method to transform the MaskedArray into a ndarray with flexible dtype.
  2. pickle the output.
  3. use the fromflex function to get a masked_array from the unpickled output.
thouis commented 12 years ago

Comment in Trac by atmention:mwiebe, 2011-03-23

It's a reference counting error of some kind, here's a partial backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff09094b8 in PyArray_Item_XDECREF (data=0xd0a718 "!\001", descr=
    0xcdae90) at numpy/core/src/multiarray/refcount.c:71
71          Py_XDECREF(temp);
(gdb) bt
#0  0x00007ffff09094b8 in PyArray_Item_XDECREF (data=0xd0a718 "!\001", descr=
    0xcdae90) at numpy/core/src/multiarray/refcount.c:71
#1  0x00007ffff0909489 in PyArray_Item_XDECREF (data=0xd0a700 "", descr=
    0xcdae40) at numpy/core/src/multiarray/refcount.c:87
#2  0x00007ffff0909a6a in PyArray_XDECREF (mp=0xcd8e60)
    at numpy/core/src/multiarray/refcount.c:173
#3  0x00007ffff08cf5fc in array_dealloc (self=0xcd8e60)
    at numpy/core/src/multiarray/arrayobject.c:260
#4  0x0000003c890a3217 in subtype_dealloc (self=
    <MaskedRecords at remote 0xcd8e60>)
    at /usr/src/debug/Python-2.7/Objects/typeobject.c:1002
#5  0x0000003c890783ba in list_dealloc (op=0xcc93f8)
    at /usr/src/debug/Python-2.7/Objects/listobject.c:309