janestreet / ppx_custom_printf

Printf-style format-strings for user-defined string conversion
MIT License
23 stars 9 forks source link

Embed errors in the AST instead of raising #10

Closed panglesd closed 1 year ago

panglesd commented 1 year ago

Currently, when ppx_custom_printf encounters an error, it uses the raise_errorf function to raise a located error.

The exception is caught by ppxlib, which in this case:

The interruption of the rewriting is quite bad for the user experience! The implication for the users are:

For instance:

let time = 1

let invalid1 =
  Format.printf !"The time is %{invalid#invalid} and the timezone is." time

let invalid2 =
  Format.printf !"The time is %{invalid#invalid} and the timezone is." time

let valid = Format.printf !"The time is %{Valid} and the timezone is." time

would report several errors:

You can find more information about error reporting in PPXs in this section of the ppxlib manual.

:question: Would you be willing to accept contributions to this issue? I'm considering assigning its resolution as part of an outreachy internship: see more information here.

v-gb commented 1 year ago

Such a change seems reasonable, but why not change ppxlib instead to use the mechanism you refer to when an exception is raised instead of replacing the whole file? That'd seem much better.

panglesd commented 1 year ago

Thanks for your answer!

This is a valid question! First, let me argue that even if we changed the above mechanism in ppxlib, this issue is still relevant, as it is still better to embed errors instead of raising: embedding errors allows to report multiple errors at once!

For instance, in

let invalid1 =
  Format.printf !"The time is %{invalid#invalid} and the timezone is %{invalid#too}." time timezone

you ideally would like both invalid syntax to be reported at once. Also you would like to have that the format string has two "hole", for merlin to know the right arity of the function.

Now, why not change ppxlib's behaviour?

The behaviour I explained was introduced a bit more than one year ago (before, it was even worse: when an exception was caught, the whole ast was replaced by the error node; this means that only one error was reported at all, but more importantly, no Merlin feature (such as jump-to-definition) could work...) When deciding the right behaviour, we had two options:

  1. Catching an exception stops the rewriting process, and adds an error at the beginning of the last valid AST
  2. Catching an error adds an error to the last valid AST, and continues the rewriting.

We went for solution 1. I think the main reason were that we wanted to give a way for rewriters to stop the process, and exceptions seemed well suited for this. Another reason were that in any case it is better to embed errors. Finally, it was a smaller change in the behaviour... (But I confess I am not sure to recall or understand the reasons that lead to this choice!) I would be open to change the behaviour, but in any case, it would be even better to turn all rewriters into embedding errors instead of raising!

v-gb commented 1 year ago

Yeah, the change could still be worthwhile but my claim is that changing ppxlib gives you 95% of the value for 10% of the work. And once you have that, maybe the work to report multiple errors is still worthwhile, or maybe it complicates the code and doesn't seem worthwhile.

As for ppxlib's behavior, I have worked on a number of ppx for many years, and I cannot recall a single time where I've wanted to stop all processing. What kind of situation do you have in mind?

If your ppx claims "I'm rewriting [%foo]", it seems pretty clear to me that ppxlib can replace [%foo ...] by [%ocaml.error ...] if the ppx fails, and proceed. The rewrite is local, so the failures are also local. It's certainly the case for the errors from Located.raise_error, and most likely just true for every exception. I mean, you can exclude Break and Out_of_memory and Stack_overflow to behave marginally nicer, but that doesn't seem necessary to me, just nicer.

Similarly if your ppx is [@@deriving something], then ppxlib can add a [%%%ocaml.error ...] on failure to process something.

If your ppx is defined as a whole ast transformation, in that case, maybe it's not possible to do better than what ppxlib does, and avoiding raising would be better. But that's relatively few preprocessors.

panglesd commented 1 year ago

I agree with you, and (if the agreement in ppxlib's maintainer is confirmed) I will redirect the work to changing ppxlib's behaviour.

(As I tried to understand the reasons leading to this choice (which I now don't agree with), I remembered that I even opened a PR implementing your suggestion! (the PR would need a complex rebase though now).)

As I mentioned, even with the change in ppxlib's behaviour, embedding can still be worthwhile in order to report multiple errors. However, I agree that it can be quite a change in the codebase, for a smaller benefit.

So, I would like to now if you would be willing to accept such contributions? If yes, I'll close this issue and open another one, focused on "multiple error reporting". If you don't have the time resource to review, or don't think the risk of introducing new bugs is worth it, I totally understand it, and will simply close this issue.

In any case, thanks for your input!

v-gb commented 1 year ago

I think changing the code to embed errors when it leads to better behavior (not systematically replacing all raises) is fine, if the change is not too complicated. Here, I think it's only the code that parses the inside of %{...} that would need to be tweaked (and the tweak seems nice and local), not the code that looks for the %{.

panglesd commented 1 year ago

Replaced by #11.