poidasmith / xlloop

XLLoop Excel Function (UDF) Server
104 stars 49 forks source link

XL_TYPE_ARRAY ? #16

Open sdementen opened 10 years ago

sdementen commented 10 years ago

I am using xlloop with the python server. On the python side, I am using numpy to handle large arrays/vectors. The elements within these arrays have always the same type (often doubles but maybe strings). With the current XL_TYPE_MULTI, the type is encoded for each element so there is no easy way to process n elements in one shot. Moreover, the numpy library allows to convert the array directly to bytes. However, as one must give the type for each element, we cannot directly use these bytes.

So, could it be possible to have a XL_TYPE_ARRAY behaving as the XL_TYPE_MULTI except that the datatype would be encoded once at the beginning of the stream. So we would have XL_TYPE_MULTI, ROWS, COLS, XL_TYPE_of_element (NUM/INT/...), ELEMENT1, ELEMENT2, ..., ELEMENTn (with n = ROWS x COLS) ? This would ease the exchange of data through the sockets for python (but R and Java may also be in the same league...).

If such functionnality already exist, where could i find some documentation on it ?

sebastien

sdementen commented 10 years ago

BTW, here is the code I am using to encode a numpy.array in the xlloop.py server (added in XLCodec.encode method) and that could be improved with XL_TYPE_ARRAY

        elif isinstance(value, np.ndarray):
            socket.send(struct.pack('B', XL_TYPE_MULTI))        # this could be a XL_TYPE_ARRAY 
            sh = value.shape
            rows = sh[0]
            value = value.reshape((-1,))
            socket.send(struct.pack('>i', rows))
            if len(sh) == 1:
                # one dimensional
                socket.send(struct.pack('>i', 0)) # zero cols
            else:
                # two dimensional
                assert len(sh)==2
                cols = sh[1]
                socket.send(struct.pack('>i', cols))
            # preparing conversion for endianness + adding the type before each byte
            result = np.zeros(dtype=[('type', 'i1'), ('data', '>f8')], shape = len(value))
            result["type"] = XL_TYPE_NUM
            result["data"] = value
            socket.send(result.tostring())
            # if no endianness conversion and single type before the steam of a XL_TYPE_ARRAY, this would simplify to
            #socket.send(struct.pack('B', XL_TYPE_NUM))
            #socket.send(result.tostring())
            # ... saving memory and time
mnar53 commented 8 years ago

Assuming we are interested only to arrays of double, I think the encoding can be streamlined to

def rowcol(A): a = A.shape rank = len(a) if (rank==2): return (a[0],a[1]) elif (rank==1): return (a[0],1) # a column else: return (0,0)

def sendDoubleArray(socket,value) : (rows, cols) = rowcol(value) socket.send(struct.pack('B', XL_TYPE_MULTI))
socket.send(struct.pack('>i', rows)) if rows == 0: socket.send(struct.pack('>i', 0)) # zero cols else: socket.send(struct.pack('>i', cols)) for x in numpy.nditer(value,order='C'): socket.send(struct.pack('B', XL_TYPE_NUM)) socket.send(struct.pack('>d', x))

On the other side, the decoding of an XL_TYPE_MULTI, might be

....elif type == XL_TYPE_MULTI: rows = decodeInt(sockt.recv(4)) cols = decodeInt(sockt.recv(4)) if cols == 0 or rows == 0: return [] ##################

print 'DECODING ARRAY'

        type = ord(sockt.recv(1,socket.MSG_PEEK))
        if type == XL_TYPE_NUM:
            return decodeDoubleArray(rows,cols,sockt.recv(9_rows_cols))
        elif  type == XL_TYPE_STR:
            return decodeStringArray(rows,cols,sockt)

with: def decodeDoubleArray(rows,cols,buff): a = numpy.zeros((rows_cols))
k = 1 idx = 0 for i in xrange(rows): for j in xrange(k,k+9_cols,9): a[idx] = struct.unpack_from('>d', buff, j)[0] idx += 1 k += 9*cols return a.reshape(rows,cols)