Open chshersh opened 6 years ago
It looks like you're only looking to validate naming clashes, is that correct? If that is the case, I reckon I could do it incrementally by replacing your parser type with
type Parser = ParsecT Void Text (State (A Text))
where A
is some patricia tree or a set of names. Would this work?
@MitchStevens Maybe we also need to validate something else. It would really a great idea to walk through the whole TOML specification and find all places to analyse (researching existing TOML checkers is another good approach):
I personally don't want to implement this validation as a part of parsing, because:
tomland
. But I also don't want to have extremely poor performance of the library. Our parsing is already not that lightweight, so introducing extra State
to it might make things much slower..toml
files are not really a problem, it's good for validation to be opt-in instead of opt-out.From what I've seen on the spec page, the possible invalid states are :
I'm not sure I know how to proceed on this issue. Other parsers (https://github.com/sgarciac/bombadil/blob/5d9ac0ca9b17b0320b207dc99bb49ae1e5655630/src/tables.ts#L215, https://github.com/TheElectronWill/Night-Config/blob/f627c7875d9b811319f853a3d349d1458514d15c/toml/src/main/java/com/electronwill/nightconfig/toml/TomlParser.java#L40) build a Trie of keys, similar to your PrefixTree
. Keys are then inserted to determine if there are naming collisions . I think this is a good solution, but it relies on the mutability of the Trie.
I've seen people get around this problem in Haskell using zippers. I think It could work, but it seems like too much work to solve such a simple problem and I wonder if there is something I am missing.
@MitchStevens This is tricky question, actually. I need to think more about the possible design. For now, it's not obvious, how to implement this logic properly with existing code.
I'm leaning towards checking these during parsing.
One reason is that for example "a=1 \n a=2"
will be accepted and treated as "a=2"
because of the way key/values are inserted in the TOML
. There is no way to check that after parsing.
Maybe we don't need an extra state, in my PR for arrays of tables I'm checking keys manually. I'm not what that means for the performance.
And what about returning Maybe TOML from the insert functions?
Currently TOML
object is created using fromList
function. I guess refactoring this in order to add validation will change code significantly.
Not everything can be guaranteed statically. And some errors are not parsing errors. So we should write some validating function which checks for the following things:
[x] Defining a key multiple times is invalid.
[ ] A key already defined may not be appended
[x] Can't have key and table with the same name.
[a] x = 2
[a] b = 1
[a.b] c = 2
[x] Attempting to define a normal table with the same name as an already established array must produce an error at parse time.