Open sdementen opened 10 years ago
BTW, here is the code I am using to encode a numpy.array in the xlloop.py server (added in XLCodec.encode method) and that could be improved with XL_TYPE_ARRAY
elif isinstance(value, np.ndarray):
socket.send(struct.pack('B', XL_TYPE_MULTI)) # this could be a XL_TYPE_ARRAY
sh = value.shape
rows = sh[0]
value = value.reshape((-1,))
socket.send(struct.pack('>i', rows))
if len(sh) == 1:
# one dimensional
socket.send(struct.pack('>i', 0)) # zero cols
else:
# two dimensional
assert len(sh)==2
cols = sh[1]
socket.send(struct.pack('>i', cols))
# preparing conversion for endianness + adding the type before each byte
result = np.zeros(dtype=[('type', 'i1'), ('data', '>f8')], shape = len(value))
result["type"] = XL_TYPE_NUM
result["data"] = value
socket.send(result.tostring())
# if no endianness conversion and single type before the steam of a XL_TYPE_ARRAY, this would simplify to
#socket.send(struct.pack('B', XL_TYPE_NUM))
#socket.send(result.tostring())
# ... saving memory and time
Assuming we are interested only to arrays of double, I think the encoding can be streamlined to
def rowcol(A): a = A.shape rank = len(a) if (rank==2): return (a[0],a[1]) elif (rank==1): return (a[0],1) # a column else: return (0,0)
def sendDoubleArray(socket,value) :
(rows, cols) = rowcol(value)
socket.send(struct.pack('B', XL_TYPE_MULTI))
socket.send(struct.pack('>i', rows))
if rows == 0:
socket.send(struct.pack('>i', 0)) # zero cols
else:
socket.send(struct.pack('>i', cols))
for x in numpy.nditer(value,order='C'):
socket.send(struct.pack('B', XL_TYPE_NUM))
socket.send(struct.pack('>d', x))
On the other side, the decoding of an XL_TYPE_MULTI, might be
....elif type == XL_TYPE_MULTI: rows = decodeInt(sockt.recv(4)) cols = decodeInt(sockt.recv(4)) if cols == 0 or rows == 0: return [] ##################
type = ord(sockt.recv(1,socket.MSG_PEEK))
if type == XL_TYPE_NUM:
return decodeDoubleArray(rows,cols,sockt.recv(9_rows_cols))
elif type == XL_TYPE_STR:
return decodeStringArray(rows,cols,sockt)
with:
def decodeDoubleArray(rows,cols,buff):
a = numpy.zeros((rows_cols))
k = 1
idx = 0
for i in xrange(rows):
for j in xrange(k,k+9_cols,9):
a[idx] = struct.unpack_from('>d', buff, j)[0]
idx += 1
k += 9*cols
return a.reshape(rows,cols)
I am using xlloop with the python server. On the python side, I am using numpy to handle large arrays/vectors. The elements within these arrays have always the same type (often doubles but maybe strings). With the current XL_TYPE_MULTI, the type is encoded for each element so there is no easy way to process n elements in one shot. Moreover, the numpy library allows to convert the array directly to bytes. However, as one must give the type for each element, we cannot directly use these bytes.
So, could it be possible to have a XL_TYPE_ARRAY behaving as the XL_TYPE_MULTI except that the datatype would be encoded once at the beginning of the stream. So we would have XL_TYPE_MULTI, ROWS, COLS, XL_TYPE_of_element (NUM/INT/...), ELEMENT1, ELEMENT2, ..., ELEMENTn (with n = ROWS x COLS) ? This would ease the exchange of data through the sockets for python (but R and Java may also be in the same league...).
If such functionnality already exist, where could i find some documentation on it ?
sebastien