libdynd / dynd-python

Python exposure of dynd
http://libdynd.org
Other
118 stars 23 forks source link

possibly confusing error message nd.array fails to encode np.arrays in lists + use case: ctbns #372

Open mpacer opened 8 years ago

mpacer commented 8 years ago

Just testing out the functionality for moving from np.arrays to nd.arrays and I'm surprised that the following fails:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-cb56525526cf> in <module>()
----> 1 nd.array([np.random.random(size=[2,2])])

dynd/nd/array.pyx in dynd.nd.array.array.__init__ (/Users/cocosci/Dropbox/Work/Resources/Repos/dynd-python/build/temp.macosx-10.10-x86_64-3.4/array.cxx:1390)()

TypeError: only length-1 arrays can be converted to Python scalars

Since the np.array has all of the striding data encoded internally, and np.arrays are just the non-ragged case of an nd.array, I feel like this should work. I imagine what is an issue is that it's ambiguous as to whether this is a case where you want a 2 × 2 nd.array or a 1 × 1 nd.array with object type 2 × 2.

Even so, if that is the issue, the error should say something about that ambiguity.

For a use case, consider if one has many clusters of continuous-time Markov Processes with subsets of nodes that are tightly coupled (locally ergodic) and that have relatively few links between these node-clusters. One might (in the vein of ctbns) want to define conditional intensity matrices for different values of discrete nodes, which are most easily stored as a n × n matrix associated with each cluster, and then the large scale dependencies between clusters at a higher order of abstraction.

It would seem that the ragged array approach would work well for this: nd.array with k top level dimensions, that describe the "out-going" state info from each cluster, and then inside each of the k clusters is a _ni × _ni matrix defining the internal dynamics of that cluster, where _ni can be different for each ik.

mwiebe commented 8 years ago

I suspect you want this to become a 3 dimensional array with shape (1, 2, 2), just as it does in NumPy?

In [2]: np.array([np.random.random(size=[2,2])])
Out[2]: 
array([[[ 0.89883947,  0.75418759],
        [ 0.53712153,  0.15815001]]])

I agree this error message is not very helpful, we should do better.

mpacer commented 8 years ago

Well where dynd would come in handy would be in capturing the shapes as explicit datattypes of the conditional intensity matrices

test1 = np.array([np.random.random(size=[2,2])])
test2 = np.array([np.random.random(size=[2,2]), np.random.random(size=[3,3])])
test1.shape, test2.shape

which outputs:

((1, 2, 2), (2,))

Which loses all the information about the underlying element shapes once you have two matrix elements that have different shapes.

To be fair, I'm not sure if this would be the right way to implement the conditional intensity matrices for ctbns (I need to look more closely at how they were originally implemented, and had been trying to do it de novo seeing if I could make them work nicely with the nd.array directly), but it was a place where I figured the more free data-type structure could be useful.

I just ended up stymied immediately upon trying to explore whether that would work well and was a bit surprised at the error message.