ocaml-ppx / ppxlib

Base library and tools for ppx rewriters
MIT License
246 stars 98 forks source link

Pass attribute location to Deriving.Generator #60

Open NathanReb opened 5 years ago

NathanReb commented 5 years ago

Currenlty it seems that the location passed to the generator function

loc: Location.t -> path: string ->  `input_ast -> `output_ast

at least in the str_type_decl case is the loc for the whole Pstr_type node. This is a bit annoying because if you then use that loc for the generated nodes (which I assume is what most ppxlib users do) that means that any error in there will be reported on the whole type declaration. That can be a bit confusing to the end user of the deriving plugin.

What I'd like to be able to do instead is locate it on the attribute node for [@@deriving ...] which I believe would make more sense. It might even be worth locating it on the right part of the attribute payload, ie the one corresponding to the registered deriver.

I don't think there's a nice way, if there's even one, to get that location from the input ast nodes.

Would you consider changing what is passed as loc or at least adding a separate argument for the different relevant locations?

hhugo commented 5 years ago

What version are you using. I think it was fixed 2 months ago. https://github.com/ocaml-ppx/ppxlib/pull/55

NathanReb commented 5 years ago

I'm using 0.5.0.

NathanReb commented 5 years ago

Let me know if you want me to provide a minimal example.

ghost commented 5 years ago

I usually use the location of the element I'm processing. i.e. if I see x and need to produce a reference to sexp_of_x, then the location of sexp_of_x will be the same as x. That works relatively well in practice.

I suppose passing the location of the deriver name to the generator would be fine. Since this requires an API change, we should add a more future proof version of the make function, for instance one that takes a context as argument containing the various locations and the path.

NathanReb commented 5 years ago

I tend to find it a little unsettling when errors for generated code are located on code that I wrote myself, especially when the error is highlighted by merlin in my editor. I can imagine it being even more disturbing to ocaml newcomers. That might be completely subjective though.

I also feel like it would look more consistent with how ppxlib's extensions work.

From that same subjective point of view I think that'd be great to have such a context value passed to the generator. On the other hand, I wonder if consistency across ppxlib based plugins wouldn't be preferable. Having such a context with multiple locations to chose from might result in every plugin having its own conventions.

NathanReb commented 5 years ago

when errors for generated code are located on code that I wrote myself

Let me clarify that a bit since it will inevitably be located on code that I wrote. What I mean is I prefer the error to be located on the part of the code responsible for the code generation, ie in the case of a deriver, here:

type t = int
[@@deriving some_plugin]
            ^^^^^^^^^^^
ghost commented 5 years ago

Yh, I understand. At the same time, this might be confusing as well:

type t = int
[@@deriving some_plugin]
            ^^^^^^^^^^^
Error: Unbound value sexp_of_int

Ideally what we want is this:

type t = int
[@@deriving some_plugin]
            ^^^^^^^^^^^
In code generated by [@@deriving some_plugin]:
let sexp_of_t x = sexp_of_int x
                  ^^^^^^^^^^^
Error: Unbound value sexp_of_int
NathanReb commented 5 years ago

That would be a kick-ass error message but how doable is it? (my guess being not doable since the error is reported by the compiler afterward)

ghost commented 5 years ago

If we just want a message of the form:

type t = int
[@@deriving some_plugin]
            ^^^^^^^^^^^
In generated code:
let sexp_of_t x = sexp_of_int x
                  ^^^^^^^^^^^
Error: Unbound value sexp_of_int

Then we could achieve that by making sure to set the loc_ghost fields of all locations to true for generated bits. Then the compiler would know that this is generated code and would switch to pretty-printing it rather than fetching it from the source file.

If we want to also precise generated by [@@deriving some_plugin], then I suppose we could attach an attribute around the generated code.

NathanReb commented 5 years ago

The "generated by" is the cherry on top but even without it would be great and a massive improvement over the current situation. Especially in a world where the compiler would outline the error like in your example.

I hadn't noticed the loc_ghost field so far so I'm not exactly aware of how that would work. How do you get the compiler to pretty print loc_ghost ast nodes like this? Currently if I set loc_ghost to true the error is still simply reported with the location so I'm assuming there's either an option to enable somewhere or that it's a feature that doesn't exist yet.

What would be the steps to get there? I'd be really interested in helping making that happen!

Also we were discussing this with @emillon yesterday and he raised a couple interesting points which I'll try to summarize:

ghost commented 5 years ago

This feature doesn't exist yet. The preprocessed files dune generate contain binary ASTs, so they wouldn't be suitable here.

This feature could be implemented by modifying the code in the Location module of the compiler that prints code excerpt. When this function is given a ghost location, it would additionally pretty print a excerpt of the generated code.

The most difficult part is extracting the relevant part of the pretty-printed generated code. I suggest the following method:

let's call ast1 the final ast that is being compiled by the compiler, i.e. after all the preprocessing has happened.

  1. pretty print all of ast1 into a string, let's call s the result
  2. parse s as ast2
  3. assuming that ast1 and ast2 are equal modulo locations, build the mapping location of ast1 -> location of ast2 by traversing both ASTs simultaneously
  4. lookup the location received by the Location module in this mapping, let's call loc2 this location
  5. print the code excerpt for loc2 in s

The simultaneous traversal is quite a bit of code, though it can be automatically generated. I can point you to some code generator to easily generate it. Appart from that, it's all quite straightforward.

Note that for all this to work, individual ppx rewriters have to properly set the loc_ghost fields. However, I believe that such a feature would be a good enough incentive for authors of ppx rewriters to do it.