support for structured data

esheldon commented 9 years ago

It is convenient to have data packed into structures. For example, if a calculation requires a large number of pieces of information, it is preferable to have the following ( I realize this is a bit of a contrived example)

def func(sarray):
    for i=0,range(sarray.size):
        x = sarray['a'][i] + sarray['b'][i] + ... sarray['z'][i]
        # do something with x

as opposed to

def func(a, b, c, d, ......, z):
    for i in xrange(a.size):
        x = a[i] + b[i]  ... + z[i];
        # do something with x

This could be solved by accepting structured arrays for input

sarray = zeros(n, dtype=[('a','f8'),('b','f8'),....('z','f8')])

res=func(sarray)

(edited for bugs)

cosmo-ethz commented 9 years ago

Thank for the input @esheldon. In this particular example the number of parameters to pass is of course reduced but on the other hand the equation becomes more difficult to read. Anyway I do see cases where this could be convenient.

However, introducing structured arrays is a bit tricky:

HOPE doesn’t support string literals, which would be required to access the columns
Numpy’s structured arrays allow the user to define arrays with different data types per column. Something that is not possible in pure C.

I’m personally not a big fan of structured arrays (I don’t like the synthax sarray[“a”], prefer Pandas approach sarray.a). Anyway, let me think about this, maybe there is a good solution to this. J

esheldon commented 9 years ago

(sorry the formatting didn't go through in the email)

structured arrays map directly to an array of C structures with the same datatypes. The array can be created with or without alignment of the structure

dt=[('ra','f8'),('dec','f8'),('index','i4')]

# maps to packed C structures, no alignment
a = zeros(n, dtype=dt)

# maps to normal, unpacked C structures
dtype=numpy.dtype(dt, align=True)
a = zeros(n, dtype=dtype)

For the packed version you would need to make sure the struct in C is also packed, but for aligned it is a direct map. For simplicity you could demand only arrays created with align=True

In C the python sarray['a'][35] maps to sarray[35].a

notation:

structured arrays are built into numpy, so they are in a sense fundamental. Codes like pyfits and fitsio return structured arrays (although pyfits wraps it)

Also the sarray.a notation conflicts with python attributes. For example, you can't have a field called "size" because that is already used for the size of the array.

jakeret / hope

support for structured data #40