snap-stanford / snap

Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library.
Other
2.18k stars 799 forks source link

Temporal motifs integer overflow + possible fix #250

Open narnolddd opened 8 months ago

narnolddd commented 8 months ago

Hi there,

I came across a small counting bug when I was implementing a modified version of your three-edge up-to-three node temporal motifs algorithm in our temporal networks library and comparing to check if we were getting the same/similar counts. I don't code in C++, my version is in Rust so I can't pin down exactly where it is in the code but I expect there is an i32 somewhere that is overflowing (I actually introduced the same error in my code so thought I'd point it out).

I was testing it on some crypto transactions data where there is a super-node with 1000s of outgoing transactions with a single other node in a short space of time. I narrowed it down to a minimal set of transactions that introduces a bug, I've attached to this post at the bottom. It's of the form

1 2 1
1 2 2
1 2 3
...
1 2 2346

i.e. a directed edge from node 1 to 2 at every second up to 2346 seconds -- it's an oddly specific number but it's the smallest number of edges that would cause it 😅

Running motifs (with delta = 3600 but doesn't really matter) returns the following output in counts

18446744069414584320 0 0 0 0 18446744069414584320
0 0 0 0 0 0
0 0 0 0 0 0
18446744069414584320 0 18446744069414584320 0 0 0
0 0 0 0 0 0
2149201880 0 18446744069414584320 0 0 18446744069414584320

where the "all outgoing/all incoming" stars look close to the u64 max value (but there are no stars in the data just two node motifs).

Deleting the last line of the original file (so going just up to 2345 edges) results in counts of:

0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
2146453540 0 0 0 0 0

I noticed the bottom left counts in each instance are just slightly either side of the i32 max value (2147483647) so I wondered if somewhere when the two node counts are being subtracted from the star counts, there is a mismatch in i32 and u32/64 being used?

Thanks for reading this, I realise there are probably very few instances where you would hit this corner case of so many edges between just two nodes but it happens a fair bit in the usecase I'm dealing with which is why I thought I'd just make an issue. Let me know if I can help in any way.

sub.csv