snap-stanford / snap

Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library.
Other
2.17k stars 797 forks source link

Node Ids are signed 32-bits integers #113

Closed hhromic closed 7 years ago

hhromic commented 7 years ago

Hi ! I've been using SNAP for a while until recently that I "discovered" that the library uses signed 32-bits integers for Node Ids (TInt). This introduces an important limitation for my research because I'm using 64-bits unsigned integers for user ids (i.e. user ids from Twitter). For now I resorted to using a re-indexing + mapping strategy which costs memory and a bit of performance for bi-directional lookups.

Are there any plans in near future to move to unsigned 64-bits integers for node ids? I checked the source code and SNAP seems to rely on negative integers and in some parts id arguments are hardcoded to const int& (instead of TInt or something better like TNodeId). So I see this change might not be as straight-forward and I understand that.

I think further abstracting the node id type (i.e. a typedef) and not depending on negative numbers would open the full range of numbers assignable and thus make the library more powerful for big graphs. Any comments are highly appreciated!.

Thanks!

blazf commented 7 years ago

Hi Hugo, there is a development version of SNAP that uses 64 ints for node and other ids and can deal with graphs with more than 1.5 billion nodes: https://github.com/snap-stanford/snap-dev-64

hhromic commented 7 years ago

Hi @blazf , Thanks for your answer. I didn't know about that effort and went to check it. Indeed it seems to use TUInt64 instead of TInt, which is good news. However I noticed that it is still using signed integers. I understand that this is mostly to support using negative ids for marking nodes as invalid or "not initialised". While this is a valid design choice, I think the cost of sacrificing half of the integer range for ids is too high for this purpose, as there are other alternatives. For example using a "valid" flag. In my opinion this is an unnecessary limitation over the ids, but I understand it is not something easy to change or challenge. I think working out this limitation would be important for the scalability of the library. I would gladly help investigating this but I have zero knowledge of the inner workings of SNAP, hence I don't fully understand the design choice of signed integers for ids.

Thanks anyway! Hugo.