igraph / xdata-igraph

xdata igraph, has been merged into igraph/igraph
GNU General Public License v2.0
18 stars 3 forks source link

Design data structure for graphs that change in time #11

Closed gaborcsardi closed 10 years ago

gaborcsardi commented 10 years ago

Essentially we need a data structure in which we can assign time labels to vertices and edges. In the most general framework, one can assign a list containing "birth" and "death" time points, to each vertex and edge. This might seem overkill, but it is needed as soon as we have data in which vertices can disappear and reappear.

Ideally the data structure should be compatible with the one we have in igraph currently. Changing it is not impossible, but it is a considerable amount of work.

More in the wiki: https://github.com/gaborcsardi/igraph/wiki/Data-Structure-for-Time-Dependent-Graphs

gaborcsardi commented 10 years ago

I always use a list of g, e.g., if I want to create a time series of graph for T=10, then

 glist <- NULL
 set.seed(12345)
 for (i in 1:10) {
     glist[[i]] <- erdos.renyi.game(100,0.2)
 }

 sapply(glist,ecount)
 # [1]  997 1017  962 1004 1011 1006  997 1020  998  961

 sapply(glist,function(x) max(local.scan(x,k=1)))
 # [1] 118 125 110 115 109 106 136 128 135 104

For now, it's good enough for me to work on.

However, for a long term, it may not a bad idea to create a new "tsg" class or something, similar to "ts" (http://bit.ly/1aPmp4X), that is, you may want to embed time info, deltat (?), etc... For example, if we want to build T=10 time series of ER(n=100,p) with different p's = pvec, then

 tsg <- build.tsg(model="erdos.renyi.game", T=seq(1990,2000), n=100, p=pvec,...)

Then, eventually we may want to let some igraph functions be able to handle this tsg, e.g.,

plot(tsg)
ecount(tsg)
local.scan(tsg)

This is just my humble opinion.

I know Gabor may need much more details than this, e.g., exact structure of "tsg". I haven't discuss this with other members; they must have different/better ideas??

  • Youngser
gaborcsardi commented 10 years ago

On Mon, Dec 9, 2013 at 4:50 PM, Youngser Park youngser@jhu.edu wrote: need much more details than this, e.g., exact structure of "tsg". I haven't discuss this with other members; they must have different/better ideas??

I had the same idea to have a separate object type for TSGs during XDATA. Sometimes in large datasets, you have thousands of entities but only a small number have edges in a time interval. So you could store everything more efficiently if you have a static list of entities/vertices , and the rows/columns of adjacency matrices at each time interval corresponded to only those vertices that had an edge in that time interval. So at each time interval, we would also have a map <rc,v> where rc corresponds to the row/column index and v correspond to the vertex in the static list. Hopefully this would not complicate the implementation too much.

This is my even humbler opinion :)

Sancar Adali

gaborcsardi commented 10 years ago

I have something in mind, and will sketch it in the wiki, because text editing is somewhat limited here. We can still discuss issues here, though.

https://github.com/gaborcsardi/igraph/wiki/Data-Structure-for-Time-Dependent-Graphs

jovo commented 10 years ago

I think Disa has something to say about this?

sent from a tiny miracle cuboid.

On Dec 10, 2013, at 11:10 AM, Gabor Csardi notifications@github.com wrote:

I have something in mind, and will sketch it in the wiki, because text editing is somewhat limited here. We can still discuss issues here, though.

https://github.com/gaborcsardi/igraph/wiki/Data-Structure-for-Time-Dependent-Graphs

— Reply to this email directly or view it on GitHub.

disa-mhembere commented 10 years ago

I think Youngser and Sancar's requests are on point from my perspective. My only request is for you to support slicing (e.g tsg_obj[1:10:2]) semantics for a tsg object if possible.

Thanks Disa Mhembere Johns Hopkins University :(){:|:&};:

On Dec 10, 2013, at 11:52 AM, joshua vogelstein notifications@github.com wrote:

I think Disa has something to say about this?

sent from a tiny miracle cuboid.

On Dec 10, 2013, at 11:10 AM, Gabor Csardi notifications@github.com wrote:

I have something in mind, and will sketch it in the wiki, because text editing is somewhat limited here. We can still discuss issues here, though.

https://github.com/gaborcsardi/igraph/wiki/Data-Structure-for-Time-Dependent-Graphs

— Reply to this email directly or view it on GitHub. — Reply to this email directly or view it on GitHub.

gaborcsardi commented 10 years ago

What would tsg_obj[1:20:2] do?

disa-mhembere commented 10 years ago

I forgot we are talking R here. I meant it to mean elements 1 to 20 with a step increment of 2 i.e 1,3,5,7...19. Not sure what that should look like in R. If that's a pain then tsg_obj[1:20](all elements 1:20) is still great.

Thanks

On Tue, Dec 10, 2013 at 1:28 PM, Gabor Csardi notifications@github.comwrote:

What would tsg_obj[1:20:2] do?

— Reply to this email directly or view it on GitHubhttps://github.com/gaborcsardi/igraph/issues/11#issuecomment-30269704 .

gaborcsardi commented 10 years ago

Hmmm. Can you give a more concrete example? What are elements here? Graphs at different points of time?

I am not sure what tsg_obj is here, but if it is an igraph graph, then the [ indexing operator is already used for something else, so is [[.

adalisan commented 10 years ago

the most basic implementation would be tsg_obj as an object consisting of a list of timepoint graphs. Each timepoint graph would be an igraph object. tsg_obj might also have timestamps. My suggestion was that either this tsg_obj or a sparse version of it would be more memory efficient if we did not replicate a very large number of vertices for each igraph object. tsg_obj would have a joint list of all the vertices that exist in at least one timepoint graph. And each of the timepoint graphs (igraph objects) contain only vertices that have an edge in that timepoint. The adjacency matrices would be sparse matrices. Let's say we have 100K entities/vertices, but only 1000 of them have an edge at each time point. I think we can handle such a time series of (timepoint) graphs if we have a sparse tsg_obj like I described. I realize this complicates the implementation. It really depends on whether we encounter this kind of big graph data in the future. Other people can comment on whether this would be useful.

gaborcsardi commented 10 years ago

Would you mind creating an actual concrete example? In R? Like, say tsg_obj is a list of length 10:

tsg_obj <- list(10)

The first element is a list of 100 (?) vertices. These are the vertices ever present in the graph:

tsg_obj[[1]] <- 1:100

The rest of the elements are adjacency matrices (?):

tsg_obj[[2]] <- Matrix(0, 10, 10)

And so on. Because I am just not sure how this would work.

disa-mhembere commented 10 years ago

By slicing I mean -- if we have 100 timepoints, each of which has a graph/adjacency matrix (what I called "element" before) associated with it; if I just want to work with the 1st 50 timepoints, for instance, it would be nice to be able to get only these 50 easily. This is analogous to slicing with any sequence tsg_obj[1:50]. I personally don't care about the actual syntax so it need not be [ or [[ . Or we could just write our own stuff to do so -- I think this is secondary to actually creating the data structure so maybe you can do this last if time permits.

On Tue, Dec 10, 2013 at 4:53 PM, Gabor Csardi notifications@github.comwrote:

Hmmm. Can you give a more concrete example? What are elements here? Graphs at different points of time?

I am not sure what tsg_obj is here, but if it is an igraph graph, then the [ indexing operator is already used for something else, so is [[.

— Reply to this email directly or view it on GitHubhttps://github.com/gaborcsardi/igraph/issues/11#issuecomment-30284491 .

gaborcsardi commented 10 years ago

The problem with this, is that sometimes all edges are in a different time point. And then this would mean creating millions of small networks for representing a single network. Also, having a representation that does not allow isolate vertices is a bit too restrictive imo.

As for slicing, probably we mean different things by that. I don't think it is very useful to select a subsequence [1:50] from the sequence of graphs. What would you do with it? Remember, it is not a graph itself, so you cannot even call local.scan() or any other igraph function on it.

What I mean by slicing is that you give a point of time, and the slicing function returns a graph that is an igraph object corresponding to the network that existed at that point of time. Even better, you don't actually have to create the igraph object at all, you just call local.scan() (or any other function) and specify the time point in the call, e.g.:

local.scan(g, at="1999-10-29")

or maybe this is easier to implement:

at("1999-10-29", local.scan(g))

and this would work with any igraph function, without actually modifying them.

disa-mhembere commented 10 years ago

That sounds reasonable. If you made that happen that would be awesome.

Disa Mhembere Johns Hopkins University :(){:|:&};:

On Dec 10, 2013, at 7:27 PM, Gabor Csardi notifications@github.com wrote:

The problem with this, is that sometimes all edges are in a different time point. And then this would mean creating millions of small networks for representing a single network. Also, having a representation that does not allow isolate vertices is a bit too restrictive imo.

As for slicing, probably we mean different things by that. I don't it is very useful to select a subsequence [1:50] from the sequence of graphs. What would you do with it? Remember, it is not a graph itself, so you cannot even call local.scan() or any other igraph function on it.

What I mean by slicing is that you give a point of time, and the slicing function returns a graph that is an igraph object corresponding to the network that existed at that point of time. Even better, you don't actually have to create the igraph object at all, you just call local.scan() (or any other function) and specify the time point in the call, e.g.:

local.scan(g, at="1999-10-29") or maybe this is easier to implement:

at("1999-10-29", local.scan(g)) and this would work with any igraph function, without actually modifying them.

— Reply to this email directly or view it on GitHub.

elhmn-uw commented 10 years ago

as long as we are on the topic, i wonder if there is a function in R that read in the file with following data and split out a list of igraph objects:

raw data = data.frame with columns:

time, from, to, attribute for from, attribute for to

w/ classes numeric, factor, factor, factor, factor

output = a list of igraph objects

the t-th element of the list is an igraph object that represents the t-th period's frequency count (not just zero-one), where the t-th period is defined in some way as an input

gaborcsardi commented 10 years ago

Continued at the main repo.