Closed pucicu closed 1 year ago
Hi @pucicu ,
could you please elaborate on which data you used and how your code snippet looks like, so that we can reproduce it?
Greetings
The analysis here https://zenodo.org/record/5578298 worked in the past, but now pyunicorn now throws the error
ValueError: Buffer dtype mismatch, expected 'ADJ_t' but got ‘short’
Simply use the data and analysis from this repository and let me know if it works for you.
Thanks and best wishes Norbert
Am 24.06.2022 um 13:36 schrieb lschmidt @.***>:
Hi @pucicu https://github.com/pucicu ,
could you please elaborate on which data you used and how your code snippet looks like, so that we can reproduce it?
Greetings
— Reply to this email directly, view it on GitHub https://github.com/pik-copan/pyunicorn/issues/163#issuecomment-1165487732, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHNNEGPBTXJN2N4VKD24BTVQWMTPANCNFSM5WB77YAQ. You are receiving this because you were mentioned.
Update:
Problem appears to be within cython modules
error message:
src/pyunicorn/timeseries/visibility_graph.py in retarded_local_clustering(self)
254 norm = retarded_degree * (retarded_degree - 1) / 2.
255
--> 256 _retarded_local_clustering(N, A, norm, retarded_clustering)
257 return retarded_clustering
258
src/pyunicorn/timeseries/_ext/numerics.pyx in pyunicorn.timeseries._ext.numerics._retarded_local_clustering()
ValueError: Buffer dtype mismatch, expected 'ADJ_t' but got 'short'
problem:
pyunicorn/timeseries/visibility_graph.py
in self.retarded_local_clustering()
_retarded_local_clustering(N, A, norm, retarded_clustering)
:
A
has dtype int16
(short
in C) instead of expected ADJ_t
(~int8
as defined in pyunicorn.core._ext.types
)norm
and retarded_clustering
have dtype float64
(double
in C) instead of expected FIELD_t
(~float 32
as defined in pyunicorn.core._ext.types
)self.advanced_local_clustering()
quick fix/workaround
pyunicorn/timeseries/visibility_graph.py
in self.retarded_local_clustering()
_retarded_local_clustering(N, A, norm, retarded_clustering)
do _retarded_local_clustering(N, A.astype("int8"), norm.astype("float32"), retarded_clustering.astype("float32"))
self.advanced_local_clustering()
thorough fix:
_[resolved] dtype of norm
and retarded_clustering
:_
initialize as np.zeros(self.N, dtype="float32")
within VisibilityGraph (lines 219 and 244)
[not resolved] dtype of A
:
pyunicorn/timeseries/visibility_graph.py
in self.__init__()
vg.visibility_relations()
returns adjacency matrix A
of dtype int8
A
is then passed to InteractingNetworks.__init__(self, A)
, which returns self.adjacency
of dtype int16
pyunicorn/core/network.py
in self.set_adjacency()
int16
(or int32
if N is bigger)Thank you for the detailed analysis, @fkuehlein! You have identified two oversights in 3dab5bf89d2e224fc319ddd64aeeecc480f27fba.
The easy thing first: The original types for the arguments norm
and retarded_clustering
were the deprecated FLOATTYPE=np.float
, which should have been translated as DFIELD=np.float64
instead of FIELD=np.float32
in the Cython function signature. I have fixed that now for all such cases in 2b489da9e504dfbdb621848bb82388f510c046ca.
For the argument A
, the appropriate solution is to downcast to ADJ_t
at the call site, using the helper core._ext.types.to_cy()
that exists for this purpose. I will leave that to you, and kindly ask you to add a test that triggers all possible scalar types for the adjacency matrix, in a similar fashion to test_generic.simple_instances()
. Please look out for other places where a similar pattern might occur.
There are several layers of causation to this bug, which I document here in hopes that it will inform the resolution and mitigation of other similar bugs:
Thanks for your help and the detailed additional info, @ntfrgl!
The original issue should be resolved as of 1651a2b
. For now, that commit remains on my fork and I will create a pull request as soon as I have cleared at least the rest of VisibilityGraph
of similar possible occurrences.
I have two questions remaining in order to proceed on this:
Appropriate solution for these cases
As I have understood, the minimal solution (which you have pursued so far) would be to generally pass arguments arg
of a cython module _ext.numerics._mod(arg)
with help of _ext.numerics._mod(to_cy(arg, ARGTYPE))
to make sure they are cast to the expected type ARGTYPE
(as is done throughout core.Network
).
All remaining variables that might be affected by this within VisibilityGraph
, however, are already explicitly set to the required dtype on initialization. Namely:
The empty placeholder for the adjacency matrix A
, which is correctly initialized to be of dtype=MASK
. I guess I keep that as it is.
VisibilityGraph.time_series
and VisibilityGraph.timings
, which are initialized to be of dtype=np.float32
. Keep that, or put dtype=FIELD
there instead?
Implementation of a respective test
I'm not sure what you mean by "trigger all possible scalar types for the adjacency matrix". self.adjacency
is a property of the Network
class. So maybe the idea is to minimally initialize all classes that inherit from Network
and check what scalar types they return for self.adjacency
. Does that make sense? Or else, could you elaborate on that?
Short-term answer
ADJ_t
, then the semantic type aliases are fulfilling part of their purpose.Long-term answer (worth discussing and creating a new issue for)
Thanks a lot for your help and elaborations.
I fixed the above case accordingly in VisibilityGraph
. I also looked through all of pyunicorn.timeseries
and found no more conflicting types of the adjacency matrix, they're all safely cast to ADJ
already.
I will therefore make a pull request now so the original issue can be fixed.
Then I will look into creating an appropriate test for these cases. Thanks for the hint to the existence of parametric test fixtures! Maybe we can briefly discuss your 'long-term anser' next week with @jdonges' help.
Ok, actually found another dtype=np.uint8
in timeseries._ext.numerics._visibility()
. The function in question is only used within test_timeseries.testVisibility()
though, which is currently commented out.
Fixed both the test and the cython function with 91836d4
for a start, on a new branch. Still, the test only covers a specific testing method, so test coverage is lacking (let alone parametrisation). I will therefore further look into enhancements.
Putting this here just for documentation.
Let me add to the critical observations in your latest commit message, since it doesn't have an associated issue yet:
_visibility()
: The core of this function is a loop with a 1-bit accumulator, and should not be allocating an array and calling several NumPy methods on it.testVisibility()
: A test which essentially replicates the code under test is almost useless.Resolved by #183.
When calling
vg.retarded_local_clustering()
a dtype mismatch error occurs:ValueError: Buffer dtype mismatch, expected 'ADJ_t' but got 'short'
(the same for
vg.advanced_local_clustering()
)