jakeret / hope

HOPE: A Python Just-In-Time compiler for astrophysical computations
GNU General Public License v3.0
382 stars 27 forks source link

support for structured data #40

Open esheldon opened 9 years ago

esheldon commented 9 years ago

It is convenient to have data packed into structures. For example, if a calculation requires a large number of pieces of information, it is preferable to have the following ( I realize this is a bit of a contrived example)

def func(sarray):
    for i=0,range(sarray.size):
        x = sarray['a'][i] + sarray['b'][i] + ... sarray['z'][i]
        # do something with x

as opposed to

def func(a, b, c, d, ......, z):
    for i in xrange(a.size):
        x = a[i] + b[i]  ... + z[i];
        # do something with x

This could be solved by accepting structured arrays for input

sarray = zeros(n, dtype=[('a','f8'),('b','f8'),....('z','f8')])

res=func(sarray)

(edited for bugs)

cosmo-ethz commented 9 years ago

Thank for the input @esheldon. In this particular example the number of parameters to pass is of course reduced but on the other hand the equation becomes more difficult to read. Anyway I do see cases where this could be convenient.

However, introducing structured arrays is a bit tricky:

I’m personally not a big fan of structured arrays (I don’t like the synthax sarray[“a”], prefer Pandas approach sarray.a). Anyway, let me think about this, maybe there is a good solution to this. J

esheldon commented 9 years ago

(sorry the formatting didn't go through in the email)

structured arrays map directly to an array of C structures with the same datatypes. The array can be created with or without alignment of the structure

dt=[('ra','f8'),('dec','f8'),('index','i4')]

# maps to packed C structures, no alignment
a = zeros(n, dtype=dt)

# maps to normal, unpacked C structures
dtype=numpy.dtype(dt, align=True)
a = zeros(n, dtype=dtype)

For the packed version you would need to make sure the struct in C is also packed, but for aligned it is a direct map. For simplicity you could demand only arrays created with align=True

In C the python sarray['a'][35] maps to sarray[35].a

notation:

structured arrays are built into numpy, so they are in a sense fundamental. Codes like pyfits and fitsio return structured arrays (although pyfits wraps it)

Also the sarray.a notation conflicts with python attributes. For example, you can't have a field called "size" because that is already used for the size of the array.