wlwardiary / cable2graph

WikiLeaks Cablegate Reference Network Visualization : cables.csv to graph to svg/html5
https://dataporn.tumblr.com
29 stars 3 forks source link

Inconsistent edge direction #28

Open thedod opened 11 years ago

thedod commented 11 years ago

If you view the following graphs with visible dates:

In both cases if you look at the later cable, it refers to the earlier one (logical), but the second graph is "futuristic" :).

My theory is that the problem is in the timeline template.

wlwardiary commented 11 years ago

This might be because the full graph is actually stored as undirected, maybe igraph doesn't really care about orders in that case.

It should be a directed graph but this needs a manual cross-check with some cables. Also, changing to directed does effect some community functions and calculations of betweenness etc.

Can you name the edges? From MRN to MRN? The example was not clear. A reference to the future might be possible if the cable is missing and the date was guessed.

thedod commented 11 years ago

Doh! Last night, I wanted to add this comment here, but added it at #23 instead of here :s

Seems like there are inconsistencies within the same graph:

  • 09MADRID551 has arrows pointing at it from the cables it refers to.
  • On the same graph, the missing cable 06STATE190832 got arrows pointing to it from neighbors that refer to it.

It's true that 06STAT190832 is missing (and its date was guessed), but I've checked the cables and the neighboring nodes refer to it. They all answer a standard questionnaire, so we can even guess the content of that cable :)

Regarding missing cables: I believe that reference will always be to a missing cable, because any reference from it would be, erm, missing. Am I right?

[possibly a new ticket] post process the graphs

If the graph is not directional, and the algorithms demand(?) that it won't be, we should either remove the arrowheads (which would be a shame IMHO), or add a phase that takes the result of a split or nbh, and 'redirectionalizes' the edges according to dates.

wlwardiary commented 11 years ago

Yes, the reference network should be a directed graph, just never took the time to confirm that the directed values are correct, or not.

It might be that some of the scripts fail if the graph is changed to directed. At some point this made some trouble but it should be adjusted accordingly. Maybe it was one of the community algo's? don't remember.

The different in- and out degree measures should actually give some interesting values.

Changing the graph in c2g to directed might just work and the problem with the arrows could be in g2svg?

Also, even so the cables are created in linear time, it might be that some edges are mutual?

g = igraph.load('full.graphml')
g.to_directed()
g.summary()
IGRAPH D--- 306249 487078 --
g.is_mutual().count(True)
487078

After changing c2g and creating a directed full.graphml:

g = igraph.load('full.graphml')
g.is_mutual().count(True)
10544
g.is_mutual().count(False)
355073
print g.summary()
IGRAPH D--- 306249 365617 -- 
thedod commented 11 years ago

I've just tried to fix this at g2s (will send a pull request). I ran it on the graph I link to in my comment from 4 hours ago, so now you can no longer see the bug there. Sorry about that :)

If you think the pull request solves this issue, I think we can close it.

wlwardiary commented 11 years ago

Not really happy with it, rather would like to fix the graph as directed and have it finally verified.

Also the values in the example above are wrong.

g.to_directed(mutual=False)
g.is_mutual().count(False)
236157
g.is_mutual().count(True)
7382
print g.vcount(), g.ecount()
306249 243539

This does the correct conversion to directed.

thedod commented 11 years ago

I agree that best is to have a correct graph in the first place. My patch that does it at g2svg is just a quick and dirty way to get graphs that are if not 100% correct, at least pretty close to that (maybe there are errors in timestamps of non-missing cables). I simply need something I can research, use in posts, etc.

I don't understand what to do with your code snippet (I know nothing about the igraph library [at least yet]): Is that snippet a "recipe" to patch an existing full graph? If it is, I'd feel more comfortable if you could fix c2g and/or extract so that if I start from a fresh cables.csv - I get a graph with correct directionality (this can be checked with a version of my #30 g2svg patch that also logs to stderr all edges that have a true zero_refs_one).

Also the values in the example above are wrong

I'm not sure I understand what this sentence refers to. The graphs that my patched g2svg produce, or something else?

One more question: does is_mutual() mean bi-directional edges? How can this happen? Cables have a partial order. They can't even have loops.

wlwardiary commented 11 years ago

Did some testing with a directed version of a graph. It seems all directions are inverted and only the edges to the missing cables are correct? It's still inconclusive.

g2svg is based on the igraph.write_svg function and looking at the original it has support for arrows.

from line 1919 in igraph/init.py

            if directed:
                # Dirty hack because of the SVG specification:
                # markers do not inherit stroke colors
                print >>f, "    <g transform=\"translate(%.4f,%.4f)\" fill=\"%s\" stroke=\"%s\">" % (x2, y2, edge_colors[eidx], edge_colors[eidx]) 
                print >>f, "      <line x1=\"%.4f\" y1=\"%.4f\" x2=\"0\" y2=\"0\"/>" % (x1-x2, y1-y2)
                print >>f, "      <use x=\"0\" y=\"0\" xlink:href=\"#Triangle\" transform=\"rotate(%.4f)\"/>" % (180+angle*180/math.pi,)
                print >>f, "    </g>\n"
            else:
                print >>f, "    <line x1=\"%.4f\" y1=\"%.4f\" x2=\"%.4f\" y2=\"%.4f\" style=\"stroke: %s\"/>" % (x1, y1, x2, y2, edge_colors[eidx])

Will do some more test's, see if igraph.write_svg creates a correct graph.

The changes to c2g are minimal, just directed=True ^_^

print "Create graph..."
g = igraph.Graph(edges, directed=True)

No need to run extract, the data/edges.list file has correct values.

They are human made references, everything can happen :) is_mutual() means bi-directional and they might even be valid. Need to verify that. Yes, that means the template needs support for mutual edges as well. Self loops should be removed.

Values meant the "306249 365617", they are incorrect.

This tutorial is a good introduction to igraph and pydoc igraph.Graph answers most after that.

thedod commented 11 years ago

Removed the g2graph tweak pull request (it's at direction-patch branch of my fork if you need it). Tried g = igraph.Graph(edges, directed=True) and splitgraph now says

Traceback (most recent call last):
  File "./splitgraph", line 275, in <module>
    main()
  File "./splitgraph", line 177, in main
    gcml = giant.community_multilevel()
  File "/usr/lib/python2.7/dist-packages/igraph/__init__.py", line 1047, in community_multilevel
    raise ValueError("input graph must be undirected")
ValueError: input graph must be undirected