Closed seanlaw closed 5 years ago
Let me know your conditions in detail.
dict
? (i.e. billions of paths)Do you have too many paths to be stored as dict? (i.e. billions of paths)
Yes, the paths are in the millions to likely billions and I want to reduce the memory consumption using ZDD
Approximated counts are acceptable? Or exact counts are required for all paths?
Approximated counts are acceptable if there is a guarantee in terms of error bounds. I would like to know like to know relative percentages or be able to ask "which paths make up the top 10-20% of paths". Of course, exact counts is preferred if possible.
Is the base/universe graph is fixed or dynamic?
Universe is fixed. At least, I can determine the universe of nodes beforehand.
Before I forget to ask, are negative integers allowed as node numbers? That is, can I use [-1, 3, 4, -8]
as a sequence input? I don't think I've ever gotten strings to work as node names either (i.e., ['a', 'b', 1, 'c']
I think that counting sketch is an appropriate technique for your requirements. Counting sketch is a data structure used to approximately maintain many counters in small memory. This technique has been developed in computer networking community for flow monitoring. There are some open-sourced implementation, as follows:
A related research paper is found at https://dlnext.acm.org/doi/abs/10.1145/3341302.3342076 .
Negative numbers can be used as node names in Graphillion. Actually, any hashable objects can be used.
I don't really understand the counting sketch work but it sounds like you would not recommend Graphillion for this?
No. ZDD, the internal structure of Graphillion, represents a set of sets. From the mathematical definition, elements in a set cannot be duplicated, i.e., every element accounts for one or zero in a set. This implies that ZDD cannot be used to count elements, though ZDD is good at computing the size of a set (the number of elements in a set).
In other words, if you want to count the number of subsets (or that of subgraphs) satisfying given constraints, ZDD is a good tool. However, if you want to associate a number with each element (with each path), ZDD doesn't work.
There is a solution to utilize ZDD as counters, that is, ZDD vector or ZDD base (an extension of ZDD, which is not implemented in Graphillion). However, counting sketch is more efficient if approximation is allowed.
That makes sense. Thank you!
Let's say that I have 8 paths/sequences where each number represents a node:
Is there a way to push these into a graph using Graphillion and then be able to keep track of the fact that paths
[1, 2, 3, 4, 5]
and[8, 9, 10]
each have a count of2
(i.e., these two paths are observed twice) while the other 6 paths have a count of1
?In my case, I don't know ahead of time that, say,
[1, 2, 3, 4, 5]
is counted twice since the data is large so I would need a way to update the count as new data is being read. Is this possible with Graphillion?