wlwardiary / cable2graph

WikiLeaks Cablegate Reference Network Visualization : cables.csv to graph to svg/html5
https://dataporn.tumblr.com
29 stars 3 forks source link

determine possible dates for missing cables #2

Closed wlwardiary closed 13 years ago

wlwardiary commented 13 years ago

Other cables in the same ID range could provide the lower and upper bound. A guess on the time-range woud already be helpful.

wlwardiary commented 13 years ago

The full list of missing cables with reference count is here: https://github.com/wlwardiary/cable2graph/blob/master/diff_cnt.list

it might make sense to use only IDs with more then one reference to remove typos and mistakes?!

jstray commented 13 years ago

I like the time range idea. Especially if you layout the graph of each thread with time along one axis. Then draw something that looks like date uncertainty error bars.

As for typos, let's wait until we see how many typos there are -- or perhaps how many cables are only referenced once? I suspect quite a lot of them, so keeping only doubly-referenced cables might really cut down what you could learn from this type of analysis.

wlwardiary commented 13 years ago

56251 referenced but missing cables. 13370 are mentioned more then once.

$ grep -cvE '^1 ' diff_cnt.list

It's also possible to extract the referenced cable id from the cabler header part via regex.

See code here: https://github.com/wlwardiary/cable2graph/blob/master/ref.py#L48

wlwardiary commented 13 years ago

dates.list and diff_cnt.list now has all the data needed for this task

jstray commented 13 years ago

Awesome! I should be able to work on layout again on Wednesday.

On Friday, September 9, 2011, wlwardiary < reply@reply.github.com> wrote:

dates.list and diff_cnt.list now has all the data needed for this task

Reply to this email directly or view it on GitHub: https://github.com/wlwardiary/cable2graph/issues/2#issuecomment-2057397

wlwardiary commented 13 years ago

solved

https://github.com/wlwardiary/cable2graph/commit/2e7a1815599c4dde501f2fef490ce32343d8739c