Open alerque opened 5 years ago
If the latter, what is the actual rule? Merge all Junks excluding ones that start with
#
?
The actual rule is “Try to parse a placeable, and if you find an error inside, then everything you managed to consume up until this point is a single Junk”. It’s surely not conformant. Nice catch.
I have another question. We should (assuming the spec is to stay as is) split a Junk into multiple ones, but should we give an additional Error on each of them for the sake of uniformity? The translator perhaps made a mistake somewhere in their Placeable; will one Error be enough? On the other hand, the only way to reproduce this is not to indent lines, and it’s probably very rare.
Yeah, this is a bug in the js (and probably python) impls. Also a bug here for not having exhaustive tests on that behavior.
We had an earlier version of this bug in https://github.com/projectfluent/fluent.js/issues/248, but probably closed that too eagerly.
A bit of additional information that's not clear from the docs, the EBNF is a human readable doc, generated from the spec. It's not really the specification, nor is it exhaustive. The normative parts are the js files in syntax
, notably grammar.js
and abstract.js
. Grammar being well-formed, grammar and abstract being valid Fluent syntax.
Thanks for the feedback @Pike. I've opened matching issues on the two implementations I confirmed did not work correctly. It shouldn't be to hard to add something to the reference fixtures to test this (especially since the reference parser works as expected). Would you like me to send a PR for that?
I did eventually figure out the EBNF was generated and secondary to the actual implementation, but you are right that is not at all clear from the available docs. As far as something to follow along with when writing a parser in another language I find it much more useful than the very idiomatic JavaScript. Perhaps that's because I was writing a PEG grammar which is so close to the EBNF it could almost (but not quite) have been auto-generated.
The EBNF defines Junk as:
This makes sense to me in the context of
fixtures/comments.ftl
:This produces one Comment node and three Junk nodes because the sequential Junk lines start with a `#" with indicates a new Junk.
What I can't make sense of or seem to match is what the tooling parser does with sequential junk as in
structure/unclosed.ftl
:If I'm reading the spec right this should produce 4 Junks because the lines starting with "F", "a", and "n" are potentially new Entries and would break the grammar definition of a Junk. The lines starting "," would both naturally be part of the previous Junks.
That's not what the Javascript implementation is actually doing though, it is combining all the Junks up until the start of
key04
:Is the tooling parser off-spec here? Or am I missing something?
There doesn't seem to be a reference fixture that would trigger this. I mocked up a quick one and tested with the reference parser and the result matches my understanding of the spec.
I'm trying to figure out how to implement this is in fluent-lua and am unsure whether to follow the letter of the law here and stick to the spec or take a cue from other tooling implementations and follow their suit.
If the latter, what is the actual rule? Merge all Junks excluding ones that start with
#
?