pik-copan / pyunicorn

Unified Complex Network and Recurrence Analysis Toolbox
http://pik-potsdam.de/~donges/pyunicorn/
Other
195 stars 86 forks source link

BUG: Core dump and Segfault when calculating `Network.nsi_betweenness()` centrality #142

Closed OzanSahin92 closed 9 months ago

OzanSahin92 commented 4 years ago

I am currently trying to calculate the nsi_betweenness of my directed graph. printing my network object gives me

Network: undirected, 25600 nodes, 61761383 links, link density 0.188.

I tried to find the error with gdb and it seems to happen because of indexerrors in the cython function _nsi_betweenness. I think that the problem lies in the indexing via the degree. Some arrays in the function are of size [0,E] with E being the number of edges/links and indexing is done via the sum of degrees, which can be of size [0,2*E]. that results in a segmentation error.

running gdp python on the core dump file lead to this:

Reading symbols from python...

warning: core file may not match specified executable file.
[New LWP 9520]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `python nsi_bc.py 2 95 100'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00002b4d53031756 in __main__::_nsi_betweenness$241(long long, long long, Array<double, 1, C, mutable, aligned>, Array<long long, 1, C, mutable, aligned>, long long, Array<float, 1, C, mutable, aligned>, Array<float, 1, C, mutable, aligned>, Array<int, 1, C, mutable, aligned>, Array<int, 1, C, mutable, aligned>, Array<double, 1, C, mutable, aligned>, Array<int, 1, C, mutable, aligned>) ()

running gdb python again with the _nsi_betweenness function now pythonized lead to this:

(gdb) run nsi_bc.py 2 95 100
Starting program: /home/osahin/anaconda3/envs/myenv/bin/python nsi_bc.py 2 95 100
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fc0e8d06700 (LWP 18074)]
[New Thread 0x7fc0e8505700 (LWP 18075)]
[New Thread 0x7fc0e3d04700 (LWP 18076)]
[New Thread 0x7fc0e1503700 (LWP 18077)]
[New Thread 0x7fc0ded02700 (LWP 18078)]
[New Thread 0x7fc0dc501700 (LWP 18079)]
[New Thread 0x7fc0d9d00700 (LWP 18080)]
[New Thread 0x7fc0d74ff700 (LWP 18081)]
[New Thread 0x7fc0d4cfe700 (LWP 18082)]
[New Thread 0x7fc0d24fd700 (LWP 18083)]
[New Thread 0x7fc0cfcfc700 (LWP 18084)]
[New Thread 0x7fc0cd4fb700 (LWP 18085)]
[New Thread 0x7fc0cacfa700 (LWP 18086)]
[New Thread 0x7fc0c84f9700 (LWP 18087)]
[New Thread 0x7fc0c5cf8700 (LWP 18088)]
[New Thread 0x7fc0c34f7700 (LWP 18089)]
[New Thread 0x7fc0c0cf6700 (LWP 18090)]
[New Thread 0x7fc0be4f5700 (LWP 18091)]
[New Thread 0x7fc0bbcf4700 (LWP 18092)]
[New Thread 0x7fc0b94f3700 (LWP 18093)]
[New Thread 0x7fc0b6cf2700 (LWP 18094)]
[New Thread 0x7fc0b44f1700 (LWP 18095)]
[New Thread 0x7fc0b1cf0700 (LWP 18096)]
[New Thread 0x7fc0af4ef700 (LWP 18097)]
[New Thread 0x7fc0accee700 (LWP 18098)]
[New Thread 0x7fc0aa4ed700 (LWP 18099)]
[New Thread 0x7fc0a7cec700 (LWP 18100)]
[New Thread 0x7fc0a54eb700 (LWP 18101)]
[New Thread 0x7fc0a2cea700 (LWP 18102)]
[New Thread 0x7fc0a04e9700 (LWP 18103)]
[New Thread 0x7fc09fce8700 (LWP 18104)]
[New Thread 0x7fc09b4e7700 (LWP 18105)]
[New Thread 0x7fc098ce6700 (LWP 18106)]
[New Thread 0x7fc0964e5700 (LWP 18107)]
[New Thread 0x7fc093ce4700 (LWP 18108)]
[New Thread 0x7fc0914e3700 (LWP 18109)]
[New Thread 0x7fc090ce2700 (LWP 18110)]
[New Thread 0x7fc08c4e1700 (LWP 18111)]
[New Thread 0x7fc08bce0700 (LWP 18112)]
[Thread 0x7fc0b1cf0700 (LWP 18096) exited]
[Thread 0x7fc08bce0700 (LWP 18112) exited]
[Thread 0x7fc08c4e1700 (LWP 18111) exited]
[Thread 0x7fc090ce2700 (LWP 18110) exited]
[Thread 0x7fc0914e3700 (LWP 18109) exited]
[Thread 0x7fc093ce4700 (LWP 18108) exited]
[Thread 0x7fc0964e5700 (LWP 18107) exited]
[Thread 0x7fc098ce6700 (LWP 18106) exited]
[Thread 0x7fc09b4e7700 (LWP 18105) exited]
[Thread 0x7fc09fce8700 (LWP 18104) exited]
[Thread 0x7fc0a04e9700 (LWP 18103) exited]
[Thread 0x7fc0a2cea700 (LWP 18102) exited]
[Thread 0x7fc0a54eb700 (LWP 18101) exited]
[Thread 0x7fc0a7cec700 (LWP 18100) exited]
[Thread 0x7fc0aa4ed700 (LWP 18099) exited]
[Thread 0x7fc0accee700 (LWP 18098) exited]
[Thread 0x7fc0af4ef700 (LWP 18097) exited]
[Thread 0x7fc0b44f1700 (LWP 18095) exited]
[Thread 0x7fc0b6cf2700 (LWP 18094) exited]
[Thread 0x7fc0b94f3700 (LWP 18093) exited]
[Thread 0x7fc0bbcf4700 (LWP 18092) exited]
[Thread 0x7fc0be4f5700 (LWP 18091) exited]
[Thread 0x7fc0c0cf6700 (LWP 18090) exited]
[Thread 0x7fc0c34f7700 (LWP 18089) exited]
[Thread 0x7fc0c5cf8700 (LWP 18088) exited]
[Thread 0x7fc0c84f9700 (LWP 18087) exited]
[Thread 0x7fc0cacfa700 (LWP 18086) exited]
[Thread 0x7fc0cd4fb700 (LWP 18085) exited]
[Thread 0x7fc0cfcfc700 (LWP 18084) exited]
[Thread 0x7fc0d24fd700 (LWP 18083) exited]
[Thread 0x7fc0d4cfe700 (LWP 18082) exited]
[Thread 0x7fc0d74ff700 (LWP 18081) exited]
[Thread 0x7fc0d9d00700 (LWP 18080) exited]
[Thread 0x7fc0dc501700 (LWP 18079) exited]
[Thread 0x7fc0ded02700 (LWP 18078) exited]
[Thread 0x7fc0e1503700 (LWP 18077) exited]
[Thread 0x7fc0e3d04700 (LWP 18076) exited]
[Thread 0x7fc0e8505700 (LWP 18075) exited]
[Thread 0x7fc0e8d06700 (LWP 18074) exited]
[Detaching after fork from child process 18119]
yo
0
Traceback (most recent call last):
  File "nsi_bc.py", line 174, in <module>
    _nsi_betweenness(N, E, w, k, j, betweenness_to_j, excess_to_j, offsets.astype(np.int32), flat_neighbors, is_source, flat_predecessors)
  File "nsi_bc.py", line 55, in _nsi_betweenness
    flat_predecessors[fi] = i
IndexError: index 123522767 is out of bounds for axis 0 with size 123522767
[Inferior 1 (process 18036) exited with code 01]
fkuehlein commented 9 months ago

Could not reproduce this, Network.nsi_betweenness() works fine for me on Networks of up to 10000 nodes (I'm limited in Network size by my old machine's ram here).

Assuming this has been resolved by @ntfrgl's recent fixes in the Cython/C modules, and hence closing the issue.

ntfrgl commented 8 months ago

@fkuehlein, I don't think that you have provided sufficient reason for closing this issue: The work you mentioned on the Cython/C method signatures wouldn't concern an indexing error deeply inside a Cython function, and you could in principle, say, spin up a moderately sized Amazon Elastic Compute Cloud instance to obtain enough RAM for reproducing the issue.

Nonetheless, I agree with closing the issue for now. My commit above removes some obvious slowdowns in the method, after which I successfully tested it on a random network with slightly higher node and edge counts than the bug report:

from pyunicorn import Network
net = Network.Model("ErdosRenyi", n_nodes=26000, link_probability=.2)
btw = net.nsi_betweenness(parallelize=True)

The explanation suggested by the issue author is unfortunately incorrect: As is now explicitly asserted, the array flat_predecessors is of size E = 2 * n_links, which is the sum of all degrees (both for directed and undirected networks) rather than the number of edges.

In contrast, the error message above appears to have n_links = 61761383 and E = 123522767 = 2 * n_links + 1, which might indicate an inconsistent data structure produced by the user's script. Otherwise, it is of course still conceivable that there remains an uncovered edge case in the implementation, but we would probably need the exact adjacency matrix to reproduce it.

fkuehlein commented 8 months ago

Right, I was a bit too quick to conclude there..

All the more thanks for your speedup work and for providing good reference and documentation!