Adding paths to gospel identifiers

mrjazzybread commented 8 months ago

With these changes, gospel identifiers would have information regarding their fully qualified location from the base of the project. These paths are represented by a list of strings. For example, the identifier that represents the mem function in the Set module in the Gospel standard library would have a path field with the following list ["src/stdlib/gospelstdlib.mli"; "Set"].

n-osborne commented 8 months ago

Thank you for this PR!

I haven't yet look too much into the details of the code but I was wondering why include the file path in the beginning of the path? Would it not be better to have only module names? The file path information is already in the location of the identifier.

Also, could you please amend the documentation of the functions where you add the ~path extra argument? Some tests would be nice too.

mrjazzybread commented 8 months ago

Ok I will do that! I did some tests on my side to make sure the paths were working as intended but I can write some more. As for your question about the file path: when we create a record that represents the top-level module for a file, its name is the file's full path, so I thought the path parameter should be consistent with the names we give to modules. Should I only change how the path is represented or should I also change the module name?

n-osborne commented 8 months ago

I'm not sure I understand why the name of a module in the record representing the said mdule is the file path.

I guess you are talking about Tmodule.init_muc. We are creating an Ident.t with the file path as name and Location.none. I don't understand the reason why and I would rather keep location information in the Location and having a module name as module name. Does it make sense?

But that may be a complete other issue.

I've also noticed that the modifications proposed by this PR generate a lot of noise when running gospel check

$ cat > foo.mli << EOF
> type t
> (*@ ephemeral *)
> EOF
$ gospel check foo.mli

will print almost 500 symbols on stdout (with qualified name :-) )

mrjazzybread commented 8 months ago

In regards to the question of module names, I would agree with your proposal, I think it makes sense, is not breaking and makes the paths a bit simpler to implement. In regards to the noise: I am very sorry! I made some prints to ensure that they were being generated correctly and I forgot to delete them :((((( in my next commits these will be removed. As for the tests, since these paths are at this point only useful for tools that interface with gospel, testing its correctness is going to be a bit challenging. I would propose creating a new command that processes an .mli file with gospel annotations and prints the paths of all top level variables. Right now, I added a new flag to the check command that does just that, I am in the process of writing tests that ensure the output of that command is as expected.

n-osborne commented 8 months ago

Thanks for adding the tests!

You can also remove the print statements by amending the commit where they have been introduced, to keep the git history as clean as possible.

I'm not totally convince by having a flag on the gospel check command for test purpose only. If you modify how Ident.t are printed in the gospel dumpast command I think it would make more sense, as the path of the symbols is something we can want in this command indepently of testing.

The gospel dumpast command can be tested in a cram test. A bit verbose, but we only read it until its correct and then we just check that there is no diff.

Now, looking at what the OCaml compiler does, we are a bit different here. OCaml compiler doesn't include the path in the identifier but rather include a path and an identifier in the Texp_ident and other expression_descs. Not sure we should do the same, but I believe it is worth considering the alternatives.

Also, the name of the ident (id_str) is still not qualified (ie we don't have Longident). Maybe this could be an improvement (which involves some heavier change though).

For example:

$cat > bar.mli << EOF
> module Baz : sig
>   (*@ type bob *)
> end
> EOF
$ cat > foo.mli << EOF
> (*@ open Bar *)
>
> type t
> (*@ model m : Baz.bar *)
> EOF

will have the identifier of m's type having bar as name and Bar.Baz as path. Which is correct and we know where the symbol come from. But one could expect to have Baz.bar as name/identifier to keep the ast closer to the syntactic representation.

mrjazzybread commented 8 months ago

Hello, sorry for not responding to your last comment. In regards to your concerns over testing, I agree that having a flag specifically for testing should be avoided. I changed this and I added tests using dumpast and grep.

About the differences between my proposal and the OCaml compiler: I believe that the OCaml compiler keeps the paths in the constructors rather than the identifier because their paths are relative and ours are absolute. For my CFML translation, I never needed relative paths, but now I am considering the following scenario (which dovetails nicely with your concerns about longidents)

module A : sig
     (* predicate p (n : integer) *)
end

module B : sig
    val f : unit -> int
    (*@ ensures A.p result *) 
end

Since in the typed AST we only remain with p result translating it to CFML would be problematic. I suppose there are two solutions: adding relative paths like in OCaml or do as you said and keep the ast closer to the syntatic representation. I think I would prefer the former, mainly because I think it will keep identifiers easy to work with as well as requiring much less work to implement.

mrjazzybread commented 8 months ago

Hello, sorry for the spam :grimacing: I have changed the paths to be relative instead of absolute. This means that in your previous example

$cat > bar.mli << EOF
> module Baz : sig
>   (*@ type bob *)
> end
> EOF
$ cat > foo.mli << EOF
> (*@ open Bar *)
>
> type t
> (*@ model m : Baz.bar *)
> EOF

The path of bar' of would be Baz, instead of Foo.Baz. I think this makes storing identifiers as Longident to be redundant, since the syntactic representation is either equivalent to the relative path or has less information. I also don't think we miss out on much by not having absolute paths, but if you can think of any use cases, please let me know.

As for the question of storing relative paths in the constructors vs as a record field, I think it makes conceptually more sense to store them in identifiers since they are attributes that belong to the identifiers themselves and not of the terms to which they belong to.

Finally, as a heads up, I changed the pretty printing function that the dumpast command uses for identifiers to also print the relative path. This means that I had to slightly change the location test since it uses values from the Gospel standard library.

n-osborne commented 8 months ago

Thank you for amending the tests! Very nice use of grep, this reads really nicely :-) I think you just miss an EOF at the end of the multi-line cat command (CI is complaining about that on some platform).

Regarding the kind of path, I'm not sure what you mean. I believe that the OCaml typedtree has:

Path.t representing fully resolved paths, i.e names qualified relatively to the root of the project.
Longident.t representing qualified names as they are written in the source file, i.e names qualified relatively to the context (which modules has been opened).

My first impression was that we want the fully resolved path in the typed ast (so that we know where an identifier comes from without having to look at the context).

I was just wondering whether we want the identifier (not the path) part in Gospel to be more like the longident in OCaml, as we already have the information in the untyped AST (see the Uast.qualid type. But once we have the fully resloved path at hand, I guess having some equivalent to the longident is only usefull for pretty printing (which we do some other way). But modifying the identifier part, if we decide to do it, can be postpone to another PR (that may even makes more sense).

As for switching to relative path (relative to what is in scope IIUC), note that in ortac we don't necessarily process the file in the order of declaration, we can look for a specific information, so we don't maintain a context of what is in scope or not. For now we make do with the tagged identifier and the namespace that tells us where the symbol come from. But that's not a really pleasant API. Having the fully resolved path at hand would give us more flexibility on how we decide to handle things.

I'm very sorry that our discussion makes you go back and forth as the fully resolved path was your first proposal. Maybe postpone removing the last commit until we agree on a design.

n-osborne commented 8 months ago

Fixes #381

mrjazzybread commented 8 months ago

Hello

I have changed the git history, if everything is to your liking I think we can merge with main

n-osborne commented 7 months ago

Thanks, fantastic! There are still some conflicts with main to resolve before merging. We usually use git rebase in this project (to avoid too much merged commit in the history) Then I'll review the code in defail one last time (and depending on how this PR plays with ortac, I may want to postpone merging after next release).

n-osborne commented 7 months ago

The second commit (8da3f42) is empty. Also, I believe you should format each commit rather than having a special commit for formatting. And add an entry to the CHANGES.md file, this work deserves one :-)

n-osborne commented 7 months ago

Why?

mrjazzybread commented 7 months ago

I was trying to change the git history in my fork so that the previous commits were formatted, but I guess I must have messed something up, because this request was automatically closed :(((((((((( I'll make a new one

ocaml-gospel / gospel

Adding paths to gospel identifiers #389