Implement TOML validator

chshersh commented 6 years ago

Not everything can be guaranteed statically. And some errors are not parsing errors. So we should write some validating function which checks for the following things:

[x] Defining a key multiple times is invalid.

# DO NOT DO THIS
name = "Tom"
name = "Pradyun"

[ ] A key already defined may not be appended
```
# THIS IS INVALID
a.b = 1
a.b.c = 2
```
[x] Can't have key and table with the same name.
```
# DO NOT DO THIS
a = 2
```

[a] x = 2

- [x] Can't have duplicating table names
```toml
# DO NOT DO THIS

[a]
b = 1

[a]
c = 2

[ ] Can't have nested same table and keys
```
# DO NOT DO THIS EITHER
```

[a] b = 1

[a.b] c = 2


- [ ] Attempting to append to a statically defined array, even if that array is empty or of compatible type, must produce an error at parse time.
```toml
# INVALID TOML DOC
fruit = []

[[fruit]] # Not allowed

[x] Attempting to define a normal table with the same name as an already established array must produce an error at parse time.

# INVALID TOML DOC
[[fruit]]
name = "apple"

[[fruit.variety]]
name = "red delicious"

# This table conflicts with the previous table
[fruit.variety]
name = "granny smith"

MitchStevens commented 6 years ago

It looks like you're only looking to validate naming clashes, is that correct? If that is the case, I reckon I could do it incrementally by replacing your parser type with

type Parser = ParsecT Void Text (State (A Text))

where A is some patricia tree or a set of names. Would this work?

chshersh commented 6 years ago

@MitchStevens Maybe we also need to validate something else. It would really a great idea to walk through the whole TOML specification and find all places to analyse (researching existing TOML checkers is another good approach):

https://github.com/toml-lang/toml

I personally don't want to implement this validation as a part of parsing, because:

High parsing performance is not a goal of tomland. But I also don't want to have extremely poor performance of the library. Our parsing is already not that lightweight, so introducing extra State to it might make things much slower.
I like the approach of separate phases. It's easier to think only about parsing while performing parsing, think only about validation while performing validation, think only about conversion while performing conversion. Just makes code cleaner and easier to reason about.
And, again, since validation can decrease performance and since in real life semantically incorrect .toml files are not really a problem, it's good for validation to be opt-in instead of opt-out.

MitchStevens commented 6 years ago

From what I've seen on the spec page, the possible invalid states are :

Key Clashing issues
Invalid Bare Key (validated in the parser)
Empty Key (validated in the parser)

I'm not sure I know how to proceed on this issue. Other parsers (https://github.com/sgarciac/bombadil/blob/5d9ac0ca9b17b0320b207dc99bb49ae1e5655630/src/tables.ts#L215, https://github.com/TheElectronWill/Night-Config/blob/f627c7875d9b811319f853a3d349d1458514d15c/toml/src/main/java/com/electronwill/nightconfig/toml/TomlParser.java#L40) build a Trie of keys, similar to your PrefixTree. Keys are then inserted to determine if there are naming collisions . I think this is a good solution, but it relies on the mutability of the Trie.

I've seen people get around this problem in Haskell using zippers. I think It could work, but it seems like too much work to solve such a simple problem and I wonder if there is something I am missing.

chshersh commented 6 years ago

@MitchStevens This is tricky question, actually. I need to think more about the possible design. For now, it's not obvious, how to implement this logic properly with existing code.

jiegillet commented 5 years ago

I'm leaning towards checking these during parsing. One reason is that for example "a=1 \n a=2" will be accepted and treated as "a=2" because of the way key/values are inserted in the TOML. There is no way to check that after parsing. Maybe we don't need an extra state, in my PR for arrays of tables I'm checking keys manually. I'm not what that means for the performance.

noahhaasis commented 5 years ago

And what about returning Maybe TOML from the insert functions?

chshersh commented 5 years ago

Currently TOML object is created using fromList function. I guess refactoring this in order to add validation will change code significantly.

https://github.com/kowainik/tomland/blob/6fe11cc2f79d742f0b7c9089bdfe70bf1b6aa776/src/Toml/Parser/TOML.hs#L45-L48

kowainik / tomland

Implement TOML validator #5