inaos / iron-array

2 stars 0 forks source link

Propagate source config during copies (Python) #418

Closed FrancescAlted closed 3 years ago

FrancescAlted commented 3 years ago

Currently, the ia.copy() do not correctly transmit the source array config to destination: https://github.com/inaos/iron-array-python/blob/develop/iarray/iarray_container.py#L87-L91

The issue here is that ia.copy() accepts a cfg parameter so that the user can modify the source config before doing the copy (for example, ia.save() is a copy with an overloaded urlpath). Currently there is no way to access to the config of an existing array from the Python API.

Hence one should create a new ironArray C function that produces a iarray_config_t struct from a context. From here, one should add a new property to the Container() class (say config) that uses the iarray_config_t struct to create a ia.Config object that can be used so as to properly propagate properties (tonge twister intended).

aleixalcacer commented 3 years ago

Also, the config stack needs to be replaced with a combination of a default config and a diff stack (Python dicts). In this way, the default settings could be changed and the differences could be applied to the new settings.

This is useful in the copy function, since we will be able to change the default configuration to the configuration of the source matrix. Thus, we will keep the source configuration in the copy.

FrancescAlted commented 3 years ago

Also, when copying views, it would be great if we could propagate the properties of the original array. I.e.

precip = ia.open("precip-3m-optimal.iarr")
# the next operation should preserve properties of original `precip`
precip[0].copy()
FrancescAlted commented 3 years ago

And the same goes for evaluation of functions: the outcome should have the same chunks and blocks of operands (if they are equal), for both consistency but also, and very important, for effiency reasons.

For example, right now we have:

e = ia.zeros((10000, 10000), chunks=(1000, 1000), blocks=(100, 100), dtype=np.float64)
e3_expr = e + 3
e3 = e3_expr.eval()
print(e3.info)

gives this output:

type   : IArray
shape  : (10000, 10000)
chunks : (4096, 4096)
blocks : (128, 256)
cratio : 4760.84

As said, this is sub-optimal because:

1) The user ends getting an array with different properties than the original 2) The computation is slower because ironArray itself needs to uncompress complete chunks prior to do the computation (instead of making a more efficient use of the prefilters and decompressing just blocks).

martaiborra commented 3 years ago

The initial issue of propagating the config params of the original array when doing a copy has already been fixed in https://github.com/inaos/iron-array-python/pull/121. For the evaluation of an expression, a new separate issue has been created (#463).