pik-copan / pyunicorn

Unified Complex Network and Recurrence Analysis Toolbox
http://pik-potsdam.de/~donges/pyunicorn/
Other
195 stars 86 forks source link

CrossRecurrencePlot matrix gives seemingly random values #128

Closed emmanuelasso closed 10 months ago

emmanuelasso commented 5 years ago

I am generating a crossRecurrence plot and every time I run it I get different recurrence matrices. I have not changed the parameters.

crp = CrossRecurrencePlot(x, y, metric='euclidean', tau = 0, dim = 1, threshold=0.05)

example:

first run: recurrence_matrix: array([[0, 0, 0, ..., 0, 0, 0], [0, 1, 1, ..., 1, 1, 1], [0, 1, 1, ..., 1, 1, 1], ..., [0, 1, 1, ..., 1, 1, 1], [0, 1, 1, ..., 1, 1, 1], [0, 1, 1, ..., 1, 1, 1]], dtype=int8)

second run:

recurrence_matrix: array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 1, ..., 1, 1, 1], ..., [0, 0, 1, ..., 1, 1, 1], [0, 0, 1, ..., 1, 1, 1], [0, 0, 1, ..., 1, 1, 1]], dtype=int8)

CharlotteGeier commented 1 year ago

Hi, did this issue ever get solved?

I'm facing the same problem when generating an InterSystemRecurrenceNetwork: The blocks of the adjacency matrix related to the CrossRecurrencePlot come out differently every time. Accordingly, running only the CrossRecurrencePlot yields different results with every run.

So far, I've tried working with the stable release pyunicorn v0.6.1, the current development version from pik-copan/pyunicorn that includes the 3dab5bf commit referenced above and the current jkroenke/pyunicorn version with the e45da87 commit that refers to this issue. Unfortunately, none of these seem to solve the problem for me. The testCrossRecurrencePlot() returns different distance matrices on each run. Maybe I'm overlooking something?

Any help would be greatly appreciated! Thanks!

fkuehlein commented 1 year ago

Hi @CharlotteGeier,

thanks for pointing out! I will look into it and get back to you as soon as possible.

CharlotteGeier commented 1 year ago

Hi @fkuehlein that sounds good, thanks :)

fkuehlein commented 1 year ago

Hi @CharlotteGeier,

the above commits are both included in the current development version. However, I can confirm that CrossRecurrencePlot is still not doing what it should. Something goes wrong in the calculation of the distance matrix, which results in incorrect output, changing with every re-calculation. As you experienced, this also affects the InterSystemRecurrenceNetwork class.

We are working on this, but cannot guarantee to resolve it soon. Please excuse this inconvenience and thanks again for pointing out the issue!

CharlotteGeier commented 1 year ago

Hi @fkuehlein,

thank you for looking into the matter!

For now, I've found a workaround (just in case anyone else comes across this issue): I compute the cross-recurrence plot using PyRQA, assemble the inter-system recurrence network manually and use a slightly adapted version of the InterSystemRecurrenceNetwork class to perform the analysis.

This seems to work quite well - but I'd still be interested in a "cleaner" solution.

fkuehlein commented 1 year ago

Thanks for sharing your workaround! We'll keep you updated about any advances :)

fkuehlein commented 1 year ago

I just looked into this again, wasn't able to fix it but narrowed it down a bit:

CrossRecurrencePlot.distance_matrix() gives faulty results. The problem must be in the underlying cython functions _manhattan_distance_matrix_crp(), _euclidean_distance_matrix_crp() and _supremum_distance_matrix_crp(), in the corresponding C functions or maybe somewhere at their interface. The problem actually appears for all three possible metrics.

This eventually leads to a faulty CrossRecurrencePlot.recurrence_matrix(), which is also needed in InterSystemRecurrenceNetwork.

A simple example to reproduce the bug:

import numpy as np
import matplotlib.pyplot as plt

from pyunicorn.timeseries import CrossRecurrencePlot

# create example timeseries
x = np.linspace(0,2*np.pi, 20)
sine = np.sin(x)

# calculate CrossRecurrencePlot
crp = CrossRecurrencePlot(x=sine, y=sine, threshold=0.2)

# show faulty distance matrix
crp_dist = crp.distance_matrix(crp.x_embedded, crp.y_embedded, crp.metric)
plt.matshow(np.log10(crp_dist)) #plotting log10 here, in order to make visible how messed up the result really is
plt.colorbar(fraction=0.0455, pad=0.04)

# show faulty recurrence matrix
crp_rec = crp.recurrence_matrix()
plt.matshow(crp_rec)
plt.colorbar(fraction=0.0455, pad=0.04)

Try RecurrencePlot instead to see what the result should look like:

from pyunicorn.timeseries import RecurrencePlot

rp = RecurrencePlot(sine, threshold=0.2)

# show correct distance matrix
rp_dist = rp._distance_matrix
plt.matshow(rp._distance_matrix)
plt.colorbar(fraction=0.0455, pad=0.04)

# show correct recurrence matrix
rp_rec = rp.recurrence_matrix()
plt.matshow(rp_rec)
plt.colorbar(fraction=0.0455, pad=0.04)

@ntfrgl, could you help me out on this?

Some further notes:

ntfrgl commented 1 year ago

@fkuehlein, thank you for isolating the issue. The array indexing used incorrect dimensions in all of those C functions. In the first commit above I fixed the bug in C, and in the second commit I ported it to Cython, where the boundscheck would have found it. I also removed the print statement.

This bug was simple enough to find by staring, after you had isolated it, but more generally you might want to use a proper debugger for Cython.

Next steps:

fkuehlein commented 1 year ago

Thanks a lot @ntfrgl, I can confirm CrossRecurrencePlot is working as it should now!

I will add tests later this week.

ntfrgl commented 1 year ago

Two comments on the updated tests:

fkuehlein commented 1 year ago

I did verify that, and it does trigger the original bug because even though the faulty recurrence matrix stays the same, it gets transposed on every calculation. My idea was to keep it minimal, but I agree that your suggestions would be the better option. I will rework the tests and also add a parametric test fixture.

ntfrgl commented 1 year ago

I see, thank you.

The added benefit of not making the validity of this test depend on implementation details of its testee, apart from the possibility of the latter changing in the future, is that we might be able to apply the same testing strategy to many more functions, with the hope of finding indexing issues more systematically, by extending tests/test_generic.py::simple_instances().

CharlotteGeier commented 11 months ago

@fkuehlein and @ntfrgl thanks a lot for fixing this! :)