Closed chshersh closed 6 years ago
Okay, so this structure doesn't work:
data PrefixTree a
= Leaf (NonEmpty Text) a
| Branch (NonEmpty Text) (PrefixTree a) (PrefixTree a)
There're multiple reasons for this:
foo.bar
and x.y
in this structure since their common prefix is empty string. This can be workarounded by introducing fake empty string component to the beginning of each Key
. But this doesn't really solve problem...Branch
constructor has only two children while it actually can have multiple children. Let's say you have the following keys: a.b.x = 1
and a.b.z = 3
. This will be stored in data structure like this:Branch ("a" :| ["b"])
(Leaf ("x" :| []) 1)
(Leaf ("z" :| []) 3)
Now you want to insert a.b.y = 2
and you have problems... So Branch
should contain multiple PrefixTree
s.
2 is simpler to fix. We need to modify structure like this:
data PrefixTree a
= Leaf (NonEmpty Text) a
| Branch (NonEmpty Text) (HashMap Text (PrefixTree a))
So instead of storing only two children we store HashMap
from first component of resulting key to the rest PrefixTree
containing this component. I personally don't like this structure because it has some redundancy (first component is stored twice). I will think on this more... The following keys a.x.y = 1
and a.b.c = 2
and a.1.2 = 15
will look like this in new data strucutre:
Branch ("a" :| [])
{ "x" → Leaf ("x" :| ["y"]) 1
, "b" → Leaf ("b" :| ["c"]) 2
, "1" → Leaf ("1" :| ["2"]) 15
}
{}
braces mean HashMap
in this syntax. This structure allows to check whether we need to insert new Leaf
in this HashMap
or should we insert into existing PrefixTree
.
Regarding 1... Well, instead of having
type PrefixMap a = Maybe (PrefixTree a)
we just will have same HashMap
for PrefixTree
. So the resulting structures will look like this:
newtype Piece = Piece { unPiece :: Text }
newtype Key = Key { unKey :: NonEmpty Piece }
type Prefix = Key -- not sure about this though...
data PrefixTree a
= Leaf Key a
| Branch Prefix (PrefixMap a)
type PrefixMap a = HashMap Piece (PrefixTree a)
Okay, we have a problem. With this structure:
data PrefixTree a
= Leaf Key a
| Branch Prefix (PrefixMap a)
type PrefixMap a = HashMap Piece (PrefixTree a)
it's not possible to store both "a"
and "a.b"
at the same time which is a problem :slightly_frowning_face:
Need to think how to patch structure to not complicate things...
After discussion with @vrom911 we decided to use the following structure:
data PrefixTree a
= Leaf Key a
| Branch Prefix -- greatest common prefix
(Maybe a) -- value by key = prefix
(PrefixMap a) -- suffixes of prefix
type PrefixMap a = HashMap Piece (PrefixTree a)
This structure should allow us to solve all problems.
After thinking a little bit more on how to store
Key
components I came to conclusion that we really want to store component on each level ofToml
data type instead of storing whole key.Currently we have the following data types:
So each level can contain whole key. This is not really convenient actually... Because if you want to insert some key into such map, you need to scan whole
HashMap
in order to split by greatest common prefix. This sad and not really convenient...On the other hand, storing only single key component per level simplifies things a lot!
So we actually want to have something like this:
But with such scheme it's harder to implement pretty-printing for tables like this:
So we should analyze nested
Toml
s to not print table in this way:Also, during parsing we need to split every table key by component in insert them in this
HashMap
in some smart way. This is also undesirable.Currently I'm having hard problems with these issues:
25
10
So let's implement our own data structure! Basically what we want is to have something like monomorphic for
Key
HashMap
but which joins nodes by common prefix.I think that data structure can look like this:
And we need to have the following basic functions and instances: