not a compliant parser - Githubissues

BurntSushi commented 11 years ago

I've created an interface for your parser to work with toml-test, which automatically tests your parser with a whole bunch of test cases. (I'm at 50 right now.)

Your parser fails quite a few tests.

(One of the tests I know is bogus because of floating point stuff. I'm trying to figure out how to fix that.)

uiri commented 11 years ago

Thank you, I intended to do this at some point. Some of the errors seem to have to do with errors in your code translating from my lib to a format which your tester understands (e.g. arrays without opening brackets, the UnicodeEncodeError). Some of them have to do with being too liberal with the input that my parser accepts. Others have to do with changes I made and weren't adequately tested (if they were adequately tested they wouldn't cause test failures, obviously). I'm not sure how much I ought to enforce homogeneity in arrays even though it is in the spec because toml is primarily for configuration rather than data interchange.

BurntSushi commented 11 years ago

Thank you, I intended to do this at some point.

No problem. I'm trying to get people moving and getting their parsers tested :-)

Some of the errors seem to have to do with errors in your code translating from my lib to a format which your tester understands (e.g. arrays without opening brackets, the UnicodeEncodeError)

The only such test is string-byte-escapes. I've double-checked the other tests, and your parser still fails them. A gist that doesn't use the automated program.

I'm not sure how much I ought to enforce homogeneity in arrays even though it is in the spec because toml is primarily for configuration rather than data interchange.

Why would you release a "parser for TOML" and purposefully not be compliant with the spec in such a big way? Homogeneity isn't just for data exchange, it's to ensure well-typed structured data. This is especially useful for static languages. Python isn't a static language, but that doesn't mean dynamic languages get its own version of the TOML spec. Moreover, true homogeneity is hopefully coming soon.

If you won't implement the spec, I suggest that you make that clear somehow.

uiri commented 11 years ago

The only such test is string-byte-escapes. I've double-checked the other tests, and your parser still fails them. A gist that doesn't use the automated program.

Yes, I realize now that I've been chopping off the opening brackets in arrays. How embarassing. I fixed that, will push a commit to github in a second.

Why would you release a "parser for TOML" and purposefully not be compliant with the spec in such a big way? Homogeneity isn't just for data exchange, it's to ensure well-typed structured data. This is especially useful for static languages. Python isn't a static language, but that doesn't mean dynamic languages get its own version of the TOML spec. Moreover, true homogeneity is hopefully coming soon.

If you won't implement the spec, I suggest that you make that clear somehow.

This isn't some sticking point for me where I disagree with the spec and will refuse to enforce type homogeneity in arrays but rather, due to the way I threw together the parser, adding in enforcement for homogeneous arrays is something which I need to be careful about implementing. If someone is going to beat me to the punch and send in a pull request, I'll gladly take it :). I'll see if I can get around to implementing it this week or next regardless of other people's contributions. Especially if tuples make their way into the spec.

BurntSushi commented 11 years ago

@uiri Ah, fair enough. I misinterpreted your intentions! I've already written an article that I intend to publish if and when true homogeneous arrays/tuples lands that talks about type checking everything (including the elusive empty list). Hopefully that will help.

Also, I've updated the original gist with results from the latest pull. I've also fixed the floating point bug (temporarily). Things are looking better :-)

uiri commented 11 years ago

I'm fairly certain that duplicate keys and empty keygroups are allowed. For the former, the latter key takes precedence and for the latter, the result is supposed to be an empty hash. The implicit-and-explicit-after seems to me to be a nontrivial fix like keeping track of the type of arrays. I'm not sure why string no close isn't raising an error and key-no-whitespace looks like something which should be allowed but I guess the spec doesn't agree with me on that one. Maybe some clarification is in order… I'll put in some more fixes tomorrow if I find the time. I think I touched on or fixed all the tests I was failing?

BurntSushi commented 11 years ago

From the spec:

Be careful not to overwrite previous keys. That's dumb. And should produce an error.

Duplicate keys aren't allowed.

empty keygroups are allowed

This is possible. The spec doesn't clarify it. I believe there are some open issues regarding it. I might remove the test until it's clarified. But since duplicate keys are allowed, there could only be one empty keygroup.

The implicit-and-explicit-after seems to me to be a nontrivial fix

Not sure. You just need to keep track of whether a hash was created implicitly or explicitly. Sounds like an extra dictionary (or even better, a set) to me.

I think I touched on or fixed all the tests I was failing?

Mostly. There are a few others: string-escapes (something weird is going on there), key-with-pound, key-special-chars, float-no-leading-zero and float-no-trailing-digits.

Thanks for your patience and helping me get the kinks worked out of toml-test!

uiri commented 11 years ago

From the spec:

Be careful not to overwrite previous keys. That's dumb. And should produce an error. Duplicate keys aren't allowed.

I was under the impression that what is forbidden by that is strictly that example - overwriting a key with a keygroup. I can't find it now for some reason, but I could have sworn I saw it clarified that in a badly done config file, the latter version of a key overrides the previous version.

empty keygroups are allowed This is possible. The spec doesn't clarify it. I believe there are some open issues regarding it. I might remove the test until it's clarified. But since duplicate keys are allowed, there could only be one empty keygroup.

Maybe the spec should be editted to clarify this, but see https://github.com/mojombo/toml/issues/30#issuecomment-14004686

The implicit-and-explicit-after seems to me to be a nontrivial fix

Not sure. You just need to keep track of whether a hash was created implicitly or explicitly. Sounds like an extra dictionary (or even better, a set) to me.

Yes, another thing to track in a parser which is relatively stateless…

I think I touched on or fixed all the tests I was failing?

Mostly. There are a few others: string-escapes (something weird is going on there), key-with-pound, key-special-chars, float-no-leading-zero and float-no-trailing-digits.

I fixed those except the floats. I need to add some checks in for them, I think, but it doesn't seem to be a big deal.

Thanks for your patience and helping me get the kinks worked out of toml-test!

Thank you for writing it; it is a lot more thorough than my own simple testing.

BurntSushi commented 11 years ago

See the end of issue #81. Part of the problem with file inclusion was one of keys being overwritten. mojombo seemed to implicitly reinforce this notion.

Maybe the spec should be editted to clarify this, but see [mojombo's comment here.]

An empty key group can only be represented as []. That particular issue is talking about a named keygroup without any keys.

Yes, another thing to track in a parser which is relatively stateless…

Part of the problem might be the combination of lexing and parsing into one process. A finite state machine (a lexer) works great for picking out syntactic categories, while a parser can use that information to enforce global constraints like duplicate keys, type restrictions, etc.

uiri commented 11 years ago

See the end of issue #81. Part of the problem with file inclusion was one of keys being overwritten. mojombo seemed to implicitly reinforce this notion.

Not sure. I think it was that the included file would go under a keygroup which would introduce another possibility of overwriting a key with a keygroup.

An empty key group can only be represented as [] . That particular issue is talking about a named keygroup without any keys.

I think you mean {} . Is my parser representing empty keygroups as something else…?

All the tests should now be passing except for the implicit-and-explicit, the mixed arrays and the empty keygroup ones.

BurntSushi commented 11 years ago

I think you mean {} . Is my parser representing empty keygroups as something else…?

I don't know. My tests check that [] is rejected but that [a] with no keys defined is valid, where a maps to an empty hash. I assumed that you were referring to [] initially.

The latter is absolutely correct according to the current spec. The only point of debate here is whether [] is allowed as a valid keygroup. Namely, are keygroups with zero-length names allowed? I don't know. My guess is no. (Why? Because it's stupid.)

uiri commented 11 years ago

All of the tests except the implicit and explicit after should be passing now...

BurntSushi commented 11 years ago

Nice work :-)

Although this is what I get after a pull:

[andrew@Liger uiri-toml] toml-test ./toml-test.py 
Test: empty-implicit-keygroup (invalid)

Expected an error, but no error was reported.
-------------------------------------------------------------------------------
Test: overflow-int (invalid)

Expected an error, but no error was reported.
-------------------------------------------------------------------------------
Test: overflow-neg-int (invalid)

Expected an error, but no error was reported.
-------------------------------------------------------------------------------
Test: implicit-and-explicit-after (valid)

Traceback (most recent call last):
  File "./toml-test.py", line 36, in <module>
    tdata = toml.loads(sys.stdin.read())
  File "/home/andrew/clones/uiri-toml/toml.py", line 68, in loads
    raise Exception("What? "+group+" already exists?"+str(currentlevel))
Exception: What? a already exists?{'a': {'b': {'c': {'answer': 42}}}}

-------------------------------------------------------------------------------
Test: key-special-chars (valid)

Traceback (most recent call last):
  File "./toml-test.py", line 36, in <module>
    tdata = toml.loads(sys.stdin.read())
  File "/home/andrew/clones/uiri-toml/toml.py", line 75, in loads
    raise Exception("Missing whitespace between key name and =")
Exception: Missing whitespace between key name and =

48 passed, 5 failed

uiri commented 11 years ago

[andrew@Liger uiri-toml] toml-test ./toml-test.py Test: empty-implicit-keygroup (invalid)

Expected an error, but no error was reported.
-------------------------------------------------------------------------------

Fixed this just now :)

Test: overflow-int (invalid)

Expected an error, but no error was reported.
-------------------------------------------------------------------------------
Test: overflow-neg-int (invalid)

Expected an error, but no error was reported.
-------------------------------------------------------------------------------

These are bogus tests. The spec only specifies integer size as 64-bit minimum. In fact, I think these should be in the valid tests to make sure people aren't using 32-bit integers.

Test: key-special-chars (valid)

Traceback (most recent call last):
  File "./toml-test.py", line 36, in <module>
    tdata = toml.loads(sys.stdin.read())
  File "/home/andrew/clones/uiri-toml/toml.py", line 75, in loads
    raise Exception("Missing whitespace between key name and =")
Exception: Missing whitespace between key name and =

I'm just incredulous that equal signs are allowed in key names. It just never occured to me how wacky Keys start with the first non-whitespace character and end with the last non-whitespace character before the equals sign. could be…

BurntSushi commented 11 years ago

These are bogus tests. The spec only specifies integer size as 64-bit minimum. In fact, I think these should be in the valid tests to make sure people aren't using 32-bit integers.

!!! I missed the word "minimum" in the spec. Very nice catch. Thank you :-)

BurntSushi commented 11 years ago

Wow nice work!

All looks well, except for #3 in toml-test. I've been convinced that

key= 1

is allowed. So your parser only fails one last test (which I added today): key-two-equals.

Keep in mind that the implicit/explicit group stuff is still open to interpretation. You don't have to listen to me. It's just my understanding and it could be wrong. We're still waiting for a clarification. (What we really need is an EBNF, and we can all stop trying to interpret the subtleties of English.)

uiri commented 11 years ago

All looks well, except for #3 in toml-test. I've been convinced that
key= 1
is allowed. So your parser only fails one last test (which I added today): key-two-equals.

Keys start with the first non-whitespace character and end with the last non-whitespace character before the equals sign.

This sentence from the spec is starting to look like one of those pictures with two interpretations to me. The answer is, of course, to support both!

Keep in mind that the implicit/explicit group stuff is still open to interpretation. You don't have to listen to me. It's just my understanding and it could be wrong. We're still waiting for a clarification. (What we really need is an EBNF, and we can all stop trying to interpret the subtleties of English.)

I'm not sure… it would seem to make sense that you can define a parent keygroup after it has children. Covering more cases can't be bad, right?

BurntSushi commented 11 years ago

Covering more cases can't be bad, right?

Emphatically yes. One of the hardest parts of maintaining a project like TOML—and I don't envy @mojombo for this—is refraining from adding too much complexity.

Simplicity is the ultimate sophistication. :-)

uiri commented 11 years ago

Can you send a pull request with your test script? Or is it OK if I include it? The license is MIT, you should probably be added to the copyright notice if your code is included.

BurntSushi commented 11 years ago

Nice! You pass all tests :-)

Or is it OK if I include it? The license is MIT, you should probably be added to the copyright notice if your code is included.

While I didn't explicitly specify a license with this test inferface, all of my hobby code is released under the WTFPL.

And it's just a 30 line test interface. It was in the gist (although my pull request might have a few tweaks since then). I used a virtually identical interface to test the rest of the Python TOML parsers :-)

In short: do whatever you want. My name in the commit logs is good enough for me.

uiri commented 11 years ago

Closing this since it passes all the tests :)

uiri / toml

not a compliant parser #4