projectfluent / fluent-rs

Rust implementation of Project Fluent
https://projectfluent.org
Apache License 2.0
1.04k stars 95 forks source link

Create generic ftl serializer #241

Closed RumovZ closed 1 year ago

RumovZ commented 2 years ago

This updates @Michael-F-Bryan's work from #184 to work with the current master branch. It started out as more of a quick and ugly fix, so please let me know if there's anything you want me to smooth out.

zbraniecki commented 2 years ago

@RumovZ thank you for your contribution and apologies for a late response.

I like your PR and I think it's generally landable, but I'd like to ask for one change - for round trip validation instead of writing unit tests with strings of FTL, could you leverage existing fixtures - https://github.com/projectfluent/fluent-rs/tree/master/fluent-syntax/tests/fixtures ?

The way I imagine it is that you'd add a ./fluent-syntax/tests/serializer_fixtures.rs and pull FTL files from the fixtures directory and round trip them to see if the output is the same.

If any of the FTL doesn't roundtrip properly, you could have a blacklist static array with names of files that don't roundtrip correctly yet to fix in follow ups.

For such files we'd like to make sure that the AST created from it at least doesn't crash when serialized.

Let me know how it sounds to you.

I'm also adding @eemeli as a second reviewer.

zbraniecki commented 2 years ago

In the unit tests, I'd like to instead see a very small and basic corpus of tests that are parse -> modify -> serialize operations like:

Feel free to pick any of the above, and it's totally ok if you don't add all. We can extend it as we go. I'd like to land your PR soon.

RumovZ commented 2 years ago

Sounds good, but it's been a while, and I'll have to reimmerse myself in the matter. Nevertheless, I'll definitely try to implement this.

RumovZ commented 2 years ago

I had a look and the fixtures aren't suitable for parse-serialise roundtrip testing (i.e. comparing plaintext), as they are not normalised. In fact, only the two empty files pass the test. A lot more pass the slightly weaker parse-serialise-parse roundtrip test (i.e. comparing ASTs), but the majority also fail this one. There may be actual issues with the serializer, but I think most fail due to inconsistent whitespace around junk, which might not be trivial for the parser to preserve. Would you like to see such a parse-serialise-parse roundtrip test? Then I'd investigate those issues.

For such files we'd like to make sure that the AST created from it at least doesn't crash when serialized.

Not sure what you mean by that. Aren't both parsing and serialising infallible, if we allow for junk?

alerque commented 2 years ago

Would you like to see such a parse-serialise-parse roundtrip test?

I don't think a serializer would be any use without the ability to round-trip a parsed AST.

I can understand not being able to round-trip from the existing fixtures as many of them are designed to be problematic sources that can be parsed (partially or fully) into functional ASTs. But if you have an AST, you should be able to serialize it and then get an identical AST back by deserializing. And if you can dump the serialized format back to an FTL file, that should also parse back to the same AST. Otherwise something is broken.

RumovZ commented 2 years ago

Certainly, but serialize(parse(ftl)) == ftl implies parse(serialize(parse(ftl))) == parse(ftl), whereas the reversal is not true, so I'm asking whether testing the latter is sufficient from the maintainers' point of view. I was under the impression that @zbraniecki was referring to the former, but maybe I had misunderstood.

gregtatum commented 1 year ago

I'm triaging the pull requests, and it appears that this one has feedback that is unaddressed and is stale. Feel free to re-open as needed.

Suggestion: If you want to continue this work, if it's possible to carve it out into smaller PRs, that would make it easier for maintainers to review and land. I'm not sure without digging into the PR if that's feasible here.

RumovZ commented 1 year ago

As is stated above, it is not possible to address @zbraniecki's feedback or it needs to be clarified. There is no point in splitting this PR if maintainers are not interested in a merge (nor would it make any sense from a technical point of view).

alerque commented 1 year ago

This PR is already a follow up to #184 and is an ongoing work for something that would be useful. It does not make sense to split it up into smaller PRs either. The design requirement needs to be settled on so it can be addressed, not closed. I think the diagnosis of "unaddressed feedback" is backwards here.

gregtatum commented 1 year ago

Ok, I can re-open, but I'm not sure I have the bandwidth myself right now to dive into it myself.

RumovZ commented 1 year ago

Thanks for getting back to me on this. After two hiatuses of half a year, it would be cathartic to see it merged. No problem regarding the doc issues, I can take care of that. As for the testing, I still have questions.

I see that, at the time, I implemented pretty much all the parse -> modify -> serialize tests zbraniecki had asked for. They were meant as a replacement, but it looks like you like the existing roundtrip tests. Should I still push them? Do you want to keep both?

Could you please give a few examples of how the existing fixtures are failing to roundtrip?

I'm not sure if we're talking about the same thing here. Please clarify if you're referring to comparing serializations or ASTs, like I've suggested here. If this is about serializations, it's quite simple: The serializer naturally spits out normlized text, whereas the fixtures tend to be edge cases. E.g. it's not reasonable to expect

extra-whitespace    =  Value

to pass a roundtrip test, as preserving redundant whitespace should not be a desirable characteristic.

So my one question from above remains: Are you guys okay with comparing ASTs when roundtripping the fixtures? It looks like I already managed to make all but one file pass this test.

zbraniecki commented 1 year ago

Ideally, I'd like to see: 1) For existing fixtures make sure that double-roundrip works (so yes, AST vs source->parse->AST->serialize->parse->AST). 2) Have a bunch of well formed FTL files that roundtrip (source vs source->parse->AST->serialize->source)

I think your current PR contains sufficient amount of both - I'd just suggest moving it out of serializer.rs to tests/ and place them as *.ftl files in some tests/fixtures/well-formed/* directory and rountrip (2) on all files in that directory.

gregtatum commented 1 year ago

What Zibi says sounds reasonable. Mainly I would like to see tests that provide good coverage to ensure we don't have bugs. The advantage of having .ftl files is that they are easy to write generic tests against beyond this single case. I don't mind the inline tests like you have in the current changeset, since they are acting as unit tests next to where the code is written. However, I'd also be fine with migrating them all to .ftl files at your discretion.

Thanks for getting back to me on this. After two hiatuses of half a year, it would be cathartic to see it merged.

Now that I'm up to speed, I'll make sure and be available to help get this merged in if you are available to help contribute it! Thanks for sticking through the slow process.

I'm not sure if we're talking about the same thing https://github.com/projectfluent/fluent-rs/pull/241#issuecomment-1108832900. Please clarify if you're referring to comparing serializations or ASTs, like I've suggested here.

Mostly I was nervous about the comment above that the lack of round tripping may be bugs.

tests/fixtures/well-formed/*

Suggestion: Maybe normalized would be better than well-formed, since whitespace differences would still be well-formed, but maybe not normalized.

RumovZ commented 1 year ago

Is the documentation sufficient?

The roundtrip tests revealed a few minor errors with the serializer and parser. I fixed one parse error, but one remains and prevents the last fixture from roundtripping correctly:

key12 =
    { "." }
        four

The parser swallows the four spaces in front of "four", which is wrong, if the Python implemenation is to be believed. This behaviour is also inconsistent with the parsing of

key12 =
{ "." }
    four

(the actual text in the fixture), in which case the whitespace is preserved.

I find the parser code quite hard to get into, so I won't try to fix this one and this PR is finished from my side.

zbraniecki commented 1 year ago

Thanks for catching that! @stasm - do you remember how you intended for the dedentation to work here?

gregtatum commented 1 year ago

I find the parser code quite hard to get into, so I won't try to fix this one and this PR is finished from my side.

@RumovZ Would you mind filing this as a new issue?

RumovZ commented 1 year ago

Amazing! Thank you for your feedback and support. In the end, I didn't submit an issue for c009674. It's not a bug if you consider two TextElements equivalent to a single TextElement with the concatenation of their texts. At least, it's explicitly mentioned now.

alerque commented 1 year ago

Thanks for all the work on this guys!