inaos / iron-array

2 stars 0 forks source link

[ENH] Support for a larger range of NumPy data types #529

Closed FrancescAlted closed 2 years ago

FrancescAlted commented 2 years ago

Currently we are only supporting the float/double, int[8,16,32,64]/uint[8,16,32,64] and bool types.

Some company (Kisters) would be interested in having support for more data types, like datetime64. As ironArray does computations, that would mean that we should support datetime operations, which is clearly out of our scope.

Instead, I propose to add a attrs["_v_dtype"] and attrs["iarr_urlpath"] so that, if they exist, and the user wants a NumPy conversion (e.g. via IArray.data), there should be an extra dtype encoding step. If the array has the previous attributes, it will be called 'a view', and it can be build from an existing IArray object (via e.g. iarr.astype(attrs["_v_dtype"])).

There are two possible scenarios for doing the conversion:

1) Compatible dtypes (e.g. int64 -> datetime64 or timedelta64): these do not require an active conversion other than creating a compatible NumPy container when using ia.iarray2numpy() (or iarr.data which calls the former).

2) Incompatible dtypes (e.g. int64 -> float64): these require an active conversion via postfilters. As there are many different kind of conversions, perhaps it is a good idea to use C macros (or whatever other tool) in the inner loop of the conversion so as to simplify the code to the maximum.

I have started a PR (that will probably be closed because creating a new View subclass was more difficult than expected) at https://github.com/inaos/iron-array-python/pull/157/files# that can be used so as to see how I addressed the conversion for case 1).

Another method that can be useful to implement is:

class IArray:
...
    def is_view(self):
        if "iarr_urlpath" in self.attrs:
            return True
        else:
             return False

so that users can easily see whether this array is a view or it contains actual data.

If this goes well, this can be the starting point for more generalized views, like slice views (e.g. iarr[1,:]) or computed views (e.g. ia1 * sin(ia2)). But step by step :-)

FrancescAlted commented 2 years ago

This work has been started in PR https://github.com/inaos/iron-array-python/pull/157. However, that PR can be much more ambitious than just dtype conversion, and it could be extended to support e.g. slices or computed expressions as views. Slices as views will be useful to remove the special handling required so far in the current implementation (as the new view would behave like a native IArray object).

martaiborra commented 2 years ago

Fixed in https://github.com/inaos/iron-array-python/pull/161