Leonidas-from-XIV commented 2 years ago

374 reports that the last line gets cut off when using `(* $MDX part-end *)`. This is generally true because `Part.Parse_parts.parse_line` can only parse a line into a `Part_end` or a `Normal`, but a line with an end-marker is sort of both.

This PR adds the prefix of the line to the Part_end constructor so if there is any code that is not whitespace, it will get added to the constructor from where the code can attach the missing line to the part.

I've been thinking about the design of this solution because attaching the value to the Part_end constructor feels wrong since Ocaml_delimiter only deals with delimiters and not the payload of the block and this is changing it. I feel that maybe the parsing ought to be done differently and signal that there is something else on the line that still needs to be processed so it would get processed into something more like [Normal prefix; Part_end].

As such, I welcome other suggestions how to solve this in a "nice" way, without breaking the separation between the parsing and the delimiters in ugly ways.

Fixes #374.

NathanReb commented 2 years ago

Ocaml_delimiter is only used by Parts so removing it entirely would not be too bad if we test the latter properly.

Parts is in a pretty sad state from the look of it so we could cease this as an occasion to improve it a little bit.

Another question I'm asking myself is whether this is the right way to fix this issue. It seems that enforcing delimiters sit on their own line is sensible. Maybe we should start a conversation with the ocamlformat team to try to see how we could come to an agreement on this.

If there is nothing to be done on their side we can revert to allow inline part, or at least end delimiters I guess. The reason I'd no be too kin to support it is that I think it can lead to pretty ambiguous cases, especially if in the example from the linked issue, the end delimiter was directly followed by a begin one. Inline begin delimiters can be tedious to deal with, especially regarding indentation.

Leonidas-from-XIV commented 2 years ago

I agree that the current implementation is not great, but I think it is good as a starting point for discussion. I think we agree that the current way, where it silently drops the last line is bad - even a version where it would error out would be better because it wouldn't lead to potentially subtle errors where the inlined code is simply missing lines but it is only noticed way too late. Throwing an error in that case would be pretty simple and require fewer changes than what's in this branch already.

As a matter of fact, we weren't aware of the issue until the bugreport either, so it shows how fragile the current baheviour is.

Making OCamlformat not put the delimiters in-line and emitting an error message if people do that by hand could be a decent compromise, because as you say, the semantics of in-line delimiters are a bit complicated.

If we do decide to go forward with allowing inline end-delimiters, I'll probably drop Ocaml_delimiter because the more I think about it the less sense it makes to split the code this way.

NathanReb commented 2 years ago

Oh no don't get me wrong, there's no particular problem with the implementation. I'm merely wondering what the best fix would be here!

Leonidas-from-XIV commented 2 years ago

I've been iterating on and off on this, thinking of smaller improvements that lead to other improvements, that again lead to other refactorings. As such the removal of the File_end constructor from Parts_part.t and the addition of Content in Ocaml_delimiter.t has shown that these types are identical, just spelled somewhat differently. Thus this made me unify the types, at which point it seemed to me that it would be most sensible to just join these two modules.

I've reworked the test a bit: the test cases of Parts_part and of Ocaml_delimiter were mostly similar except the latter tested the error cases as well. I've exposed a function that returns the result type and added these test cases to the Parts test as well.

I think this is overall an improvement, because the tests tested very similar functionality, if the Ocaml_delimiter test would fail, the Parts test with the same input would fail in the same way which did not add much value to the tests. Unifying this is in my opinion a bit easier to understand and also solves the design problem that lines previously could only be a separator or content.

Leonidas-from-XIV commented 2 years ago

I would generally agree on ocamllex. However I kept the parsers as regexp for now since one of them is deprecated (and we should try to remove it for MDX 3, https://github.com/realworldocaml/mdx/issues/396), at which point it should be easier to switch the other one to ocamllex instead of porting both now and deleting one later or having a mix of two different ways of parsing.

realworldocaml / mdx

Add prefix of the `part-end` delimiter to the `part` #387

374 reports that the last line gets cut off when using `(* $MDX part-end *)`. This is generally true because `Part.Parse_parts.parse_line` can only parse a line into a `Part_end` or a `Normal`, but a line with an end-marker is sort of both.

realworldocaml / mdx

Add prefix of the `part-end` delimiter to the `part` #387

374 reports that the last line gets cut off when using (* $MDX part-end *). This is generally true because Part.Parse_parts.parse_line can only parse a line into a Part_end or a Normal, but a line with an end-marker is sort of both.

374 reports that the last line gets cut off when using `(* $MDX part-end *)`. This is generally true because `Part.Parse_parts.parse_line` can only parse a line into a `Part_end` or a `Normal`, but a line with an end-marker is sort of both.