Closed wlwardiary closed 13 years ago
The full list of missing cables with reference count is here: https://github.com/wlwardiary/cable2graph/blob/master/diff_cnt.list
it might make sense to use only IDs with more then one reference to remove typos and mistakes?!
I like the time range idea. Especially if you layout the graph of each thread with time along one axis. Then draw something that looks like date uncertainty error bars.
As for typos, let's wait until we see how many typos there are -- or perhaps how many cables are only referenced once? I suspect quite a lot of them, so keeping only doubly-referenced cables might really cut down what you could learn from this type of analysis.
56251 referenced but missing cables. 13370 are mentioned more then once.
$ grep -cvE '^1 ' diff_cnt.list
It's also possible to extract the referenced cable id from the cabler header part via regex.
See code here: https://github.com/wlwardiary/cable2graph/blob/master/ref.py#L48
dates.list and diff_cnt.list now has all the data needed for this task
Awesome! I should be able to work on layout again on Wednesday.
On Friday, September 9, 2011, wlwardiary < reply@reply.github.com> wrote:
dates.list and diff_cnt.list now has all the data needed for this task
Reply to this email directly or view it on GitHub: https://github.com/wlwardiary/cable2graph/issues/2#issuecomment-2057397
Other cables in the same ID range could provide the lower and upper bound. A guess on the time-range woud already be helpful.