Closed kgtw closed 4 weeks ago
I believe I can picture how a trie would look in yaml, possibly similar to the above string format, but in CSV I don't know how you would represent a trie.
To give some more context we are retrofitting our internal tooling to use your library as it is far superior in functionality to what we already have (thanks for this by the way!). We already have a custom "schema" which is based on yaml, with in-line comments to represent metadata (a bit of a hack). With CSV output I'm thinking it would be similar, except without the indenting to indicate the hierarchy.
example yaml structure:
0.0.0.0/0:
10.0.0.0/8: #[alias=aws-prod-usw1, environment=production, cloud=aws, region=us-west-1]
10.0.0.0/13: #[owner=team-a, account=1122334455, type=vpc]
10.1.0.0/16: public-az1 #[type=subnet, zone=public]
10.2.0.0/16: public-az2 #[type=subnet, zone=public]
10.3.0.0/16: public-az3 #[type=subnet, zone=public]
10.4.0.0/16: private-az1 #[type=subnet, zone=private]
10.5.0.0/16: private-az2 #[type=subnet, zone=private]
10.6.0.0/16: private-az3 #[type=subnet, zone=private]
example csv structure:
0.0.0.0/0
10.0.0.0/8,alias=aws-prod-usw1,environment=production,cloud=aws,region=us-west-1
10.0.0.0/13,owner=team-a,account=1122334455,type=vpc
10.1.0.0/16,type=subnet,zone=public
...
I have an account at your company and have used its services, so hey, why not, the trie printing can probably be generalized.
Having given this some thought, I now realize that this can be done easily and efficiently with existing methods, specifically iterator methods. That is what the existing trie string method does.
The use of iterators is rather simple. You can see in the examples below, most of the code is formatting each line. The node iterators provide a lot of flexibility, because you can choose any of the different iterators to print in different orderings, and you have access to all the data you need.
For the current trie string, the pre-order traversal with lower node first provides the same ordering, and the iterator for that ordering is provided by the method ContainingFirstAllNodeIterator. If you only wanted to print the nodes explicitly added to the trie, you could use ContainingFirstIterator instead. There is a diagram in the Java docs illustrating six of the 8 available orderings.
First I show how to rewrite the existing tree string method, but instead of using indentation strings, it uses the chain of parent nodes of each node. This example shows how to provide all the possible context you might need when printing.
func printTrieUsingIterator[T ipaddr.TrieKeyConstraint[T], V any](
trie *ipaddr.AssociativeTrie[T, V]) string {
builder := strings.Builder{}
builder.WriteString("\n")
iter := trie.ContainingFirstAllNodeIterator(true)
for node := iter.Next(); node != nil; node = iter.Next() {
// print the correct indentation
var parents []*ipaddr.AssociativeTrieNode[T, V]
for parent := node.GetParent(); parent != nil; parent = parent.GetParent() {
parents = append(parents, parent)
}
currentLen := len(parents) - 1
for currentLen > 0 {
parent := parents[currentLen]
below := parents[currentLen-1]
if parent.GetLowerSubNode() == below && parent.GetUpperSubNode() != nil {
builder.WriteString(inBetweenElbows)
} else {
builder.WriteString(belowElbows)
}
parents = parents[:currentLen]
currentLen--
}
// print the final indent
if len(parents) == 1 {
parent := parents[0]
if parent.GetLowerSubNode() == node && parent.GetUpperSubNode() != nil {
builder.WriteString(leftElbow)
} else {
builder.WriteString(rightElbow)
}
}
// print the node
builder.WriteString(node.String())
builder.WriteString("\n")
}
return builder.String()
}
const (
nonAddedNodeCircle = "\u25cb"
addedNodeCircle = "\u25cf"
leftElbow = "\u251C\u2500" // |-
inBetweenElbows = "\u2502 " // |
rightElbow = "\u2514\u2500" // --
belowElbows = " "
)
This can be made more efficient using the iterator caching functionality which is available with top-down iterators starting from the root node. The parents of each node do not have to be followed each time. Instead, we cache the chain of parents with each sub-node as we visit each node.
func printTrieUsingIterator2[T ipaddr.TrieKeyConstraint[T], V any](
trie *ipaddr.AssociativeTrie[T, V]) string {
builder := strings.Builder{}
iter := trie.ContainingFirstAllNodeIterator(true)
for node := iter.Next(); node != nil; node = iter.Next() {
// print the correct indentation
var originalParents []*ipaddr.AssociativeTrieNode[T, V]
if cachedParents := iter.GetCached(); cachedParents != nil {
originalParents = cachedParents.([]*ipaddr.AssociativeTrieNode[T, V])
}
parents := originalParents
currentLen := len(parents) - 1
for currentLen > 0 {
parent := parents[0]
below := parents[1]
if parent.GetLowerSubNode() == below && parent.GetUpperSubNode() != nil {
builder.WriteString(inBetweenElbows)
} else {
builder.WriteString(belowElbows)
}
parents = parents[1:]
currentLen--
}
// print the final indent
if len(parents) == 1 {
parent := parents[0]
if parent.GetLowerSubNode() == node && parent.GetUpperSubNode() != nil {
builder.WriteString(leftElbow)
} else {
builder.WriteString(rightElbow)
}
}
// print the node
builder.WriteString(node.String())
builder.WriteString("\n")
// cache the extended parents slice
newParents := append(append(
make([]*ipaddr.AssociativeTrieNode[T, V], 0, len(originalParents)+1),
originalParents...), node)
iter.CacheWithLowerSubNode(newParents)
iter.CacheWithUpperSubNode(newParents)
}
return builder.String()
}
Now we show the same for yaml, which is simpler because less context is needed.
func printTrieYaml[T ipaddr.TrieKeyConstraint[T], V any](trie *ipaddr.AssociativeTrie[T, V]) string {
builder := strings.Builder{}
iter := trie.ContainingFirstAllNodeIterator(true)
for node := iter.Next(); node != nil; node = iter.Next() {
parentCount := 0
for parent := node.GetParent(); parent != nil; parent = parent.GetParent() {
parentCount++
}
if parentCount == 0 { // top node
builder.WriteString("---\n")
} else if parentCount > len(spaces) {
spaces += spaces
}
builder.WriteString(spaces[:parentCount])
builder.WriteString(node.GetKey().String())
builder.WriteString(fmt.Sprintf(": %v\n", node.GetValue()))
}
return builder.String()
}
var spaces = " " // 32 spaces
Here we show the caching functionality, making the traversal more efficient.
func printTrieYaml2[T ipaddr.TrieKeyConstraint[T], V any](trie *ipaddr.AssociativeTrie[T, V]) string {
builder := strings.Builder{}
iter := trie.ContainingFirstAllNodeIterator(true)
for node := iter.Next(); node != nil; node = iter.Next() {
parentCount := 0
if cachedDepth := iter.GetCached(); cachedDepth != nil {
parentCount = cachedDepth.(int)
}
if parentCount == 0 { // top node
builder.WriteString("---\n")
} else if parentCount > len(spaces) {
spaces += spaces
}
builder.WriteString(spaces[:parentCount])
builder.WriteString(node.GetKey().String())
builder.WriteString(fmt.Sprintf(": %v\n", node.GetValue()))
newDepth := parentCount + 1
iter.CacheWithLowerSubNode(newDepth)
iter.CacheWithUpperSubNode(newDepth)
}
return builder.String()
}
Finally, we show csv, where no context is needed when printing each line.
func printTrieCsv[T ipaddr.TrieKeyConstraint[T], V any](trie *ipaddr.AssociativeTrie[T, V]) string {
builder := strings.Builder{}
iter := trie.ContainingFirstAllNodeIterator(true)
for node := iter.Next(); node != nil; node = iter.Next() {
builder.WriteString(node.GetKey().String())
builder.WriteString(fmt.Sprintf(",%v\n", node.GetValue()))
}
return builder.String()
}
Further expanding on that...
If you were to further generalize this, to support as you say a "custom func for formatting", it might be something like this which does the iterating for you:
// uses ipaddr.CachingTrieIterator
func printTrieContainingFirst[T ipaddr.TrieKeyConstraint[T], V any](
writer io.StringWriter,
iter ipaddr.CachingTrieIterator[*ipaddr.AssociativeTrieNode[T, V]],
nodePrinter func(
writer io.StringWriter,
iter ipaddr.CachingTrieIterator[*ipaddr.AssociativeTrieNode[T, V]],
node *ipaddr.AssociativeTrieNode[T, V])) {
for node := iter.Next(); node != nil; node = iter.Next() {
nodePrinter(writer, iter, node)
}
}
And to support any trie ordering, you'd want this, in which case you'd not have the top-down caching functionality available:
// uses any iterator
func printTrie[T ipaddr.TrieKeyConstraint[T], V any](
writer io.StringWriter,
iter ipaddr.Iterator[*ipaddr.AssociativeTrieNode[T, V]],
nodePrinter func(
writer io.StringWriter,
node *ipaddr.AssociativeTrieNode[T, V])) {
for node := iter.Next(); node != nil; node = iter.Next() {
nodePrinter(writer, node)
}
}
and when using this, the nodePrinter
functions would be equivalent to the content of the for
loops in printTrieCsv
or printTrieYaml2
above, like so:
func printTrieYaml3[T ipaddr.TrieKeyConstraint[T], V any](trie *ipaddr.AssociativeTrie[T, V]) string {
builder := strings.Builder{}
printTrieContainingFirst(
&builder,
trie.ContainingFirstAllNodeIterator(true),
func(writer io.StringWriter,
iter ipaddr.CachingTrieIterator[*ipaddr.AssociativeTrieNode[T, V]],
node *ipaddr.AssociativeTrieNode[T, V]) {
parentCount := 0
if cachedDepth := iter.GetCached(); cachedDepth != nil {
parentCount = cachedDepth.(int)
}
if parentCount == 0 { // top node
builder.WriteString("---\n")
} else if parentCount > len(spaces) {
spaces += spaces
}
builder.WriteString(spaces[:parentCount])
builder.WriteString(node.GetKey().String())
builder.WriteString(fmt.Sprintf(": %v\n", node.GetValue()))
newDepth := parentCount + 1
iter.CacheWithLowerSubNode(newDepth)
iter.CacheWithUpperSubNode(newDepth)
},
)
return builder.String()
}
func printTrieCsv2[T ipaddr.TrieKeyConstraint[T], V any](trie *ipaddr.AssociativeTrie[T, V]) string {
builder := strings.Builder{}
var iter ipaddr.Iterator[*ipaddr.AssociativeTrieNode[T, V]] = trie.AllNodeIterator(true)
printTrie(
&builder,
iter,
func(writer io.StringWriter,
node *ipaddr.AssociativeTrieNode[T, V]) {
builder.WriteString(node.GetKey().String())
builder.WriteString(fmt.Sprintf(",%v\n", node.GetValue()))
},
)
return builder.String()
}
But I am not so sure there is much benefit to this, as you can see it does not shrink or simplify the code much, if at all. These new yaml and csv print methods have increased in length. So I am skeptical I will add printTrieContainingFirst
or printTrie
to the library, since they are small, and they don't really add much, as far as I can tell, although I might give it a bit more thought. Of course, you are free to use the two of them.
Appreciate the detailed response @seancfoley! As mentioned, we were already iterating over the trie, although we weren't taking advantage of the caching approach so thanks for showing that!
I agree with your findings, the current functions expose the adequate control over templating and that a custom function for formatting doesn't actually reduce the amount of code needed to be provided in the first place.
What might be useful for others who use this library, and to increase the discoverability is to document some of the examples you've provided above in the wiki?
Added an example.
When printing the trie it comes with a pre-defined format and structure.
We have some use cases where we would like to output the trie in both CSV, and YAML for various reporting and integrations in to other systems. Obviously we could iterate over the entire trie, although a preferred approach would be to support providing a custom func for formatting.