stingergraph / StingerGraphs.jl

Julialang bindings to the STINGER graph database
http://www.stingergraph.com
Other
5 stars 3 forks source link

Kcore #17

Closed rohitvarkey closed 7 years ago

rohitvarkey commented 7 years ago

K-core implementation which basically just calls the STINGER kcore_find function and returns the labels array.

@jpfairbanks Should I return the counts array also here?

jpfairbanks commented 7 years ago

Yes I think the counts are also useful. Although that is going to be length nv when only a small fraction of the values are nonzero so the documentation should mention that the counts array is overallocated and on the values up to the first 0 are valid. You could could extract the actual counts with

for k in 1:nv(g)
    if counts[k] == 0
          break
    end
end
counts[1:k]

Actually this is a good use of the two level api. a kcore!(graph, labels, counts) function which does in place manipulation and a higher level which allocates labels and counts and returns them.

The higher level interface should return counts[1:k] but the low level interface should return labels and counts without dropping the 0s so that counts can be reused.

rohitvarkey commented 7 years ago

I was trying to follow the same semantics we offered in bfs. The plan was to make the application keep track of the number of vertices that have been inserted so far. Providing that number would allocate the optimal amount of memory.

function kcore(s::Stinger, nv::Int64)
   labels = zeros(Int64, nv)
   counts = zeros(Int64, nv)
   ....

In these cases, we don't need to run a loop to extract the counts.

counts array is overallocated and on the values up to the first 0 are valid. You could could extract the actual counts with

This can lead to issues if the user makes the mistake of starting vertex IDs with 1 instead of 0, as the first element in count will be a 0 and an empty Array will be returned. I think that is quite confusing from the perspective of a user.

I do agree that a two level API will be useful. But the levels I'd propose are:

Higher level - function kcore(s::Stinger, nv::Int64) - This allocates the labels and counts as the specified number of vertices.

Lower level - function kcore!(s::Stinger, labels::Array{Int64}, counts::Array{Int64}) - This performs the ccall and populates labels and counts.

We could also have another function kcore(s::Stinger) which overallocates with the maximum number of vertices possible, calls the lower-level API and then performs the extraction of counts and labels as proposed.

jpfairbanks commented 7 years ago

I agree your design is correct.

rohitvarkey commented 7 years ago

I decided to make a call to get_nv and find the maximum active vertex first to allocate only the required memory, rather than allocating the memory for the maximum vertices for labels and counts and then iterating through that to find out the max and returning only that part.