ocaml / dune

A composable build system for OCaml.
https://dune.build/
MIT License
1.63k stars 401 forks source link

Debug mapping is sometimes inconsistent with installed locations #7413

Open richardlford opened 1 year ago

richardlford commented 1 year ago

Expected Behavior

Consider a simple project with one library and a single package, mypackage.

Here is the source layout:

Here is an excerpt from the dune-project file:

(lang dune 3.7)
(name mypackage)
(package
 (name mypackage))

And here is the "lib/dune" file:

(library
 (name lib1)
 (public_name mypackage.lib1)
 (modes byte)
 )

I did a dune install --prefix _opamroot and here is an excerpt of the layout it produced:

_opamroot/
_opamroot/lib
_opamroot/lib/mypackage
_opamroot/lib/mypackage/lib1
_opamroot/lib/mypackage/lib1/lib1.ml
_opamroot/lib/mypackage/lib1/lib1.cma

Now, when the debugger is debugging a program that used this library, we would want the debug information from the library to be:

/workpace_root/mypackage/lib1

so that after the debugger does a mapping from /workspace_root to opamroot/lib, that the directory will become

opamroot/lib/mypackage/lib1

which is where the source code and '.cma' file are located.

Actual Behavior

The following debug directories were produced in the 'lib1.cma':

The problem is that the debug information is missing information about which package contains the library. When building a module that is part of a package that will be potentially installed, the mapping should include the package name, and also be as if the library location was at the root of the package. So the BUILD_PATH_PREFIX_MAP in this case should be:

BUILD_PATH_PREFIX_MAP="/workspace_root/mypackage=$cwd/lib"

Reproduction

I will open a PR soon, but I think the above describes the issue adequately. Also, I plan to work on a fix for this Dune problem, and am also working on a Dune enhancement to start up ocamldebug with a BUILD_PATH_PREFIX_MAP to maps from /workspace_root to the actual source locations.

Specifications

rgrinberg commented 1 year ago

BUILD_PATH_PREFIX_MAP="/workspace_root/mypackage=$cwd/lib"

BUILD_PATH_PREFIX_MAP is a documented standard, I don't think we can just modify its meaning and syntax like that. Can you introduce another variable instead?

Also, could you clarify all the information that ocamldebug wants to look up in the first place. We have three different things:

What exactly does ocamldebug need?

richardlford commented 1 year ago

BUILD_PATH_PREFIX_MAP="/workspace_root/mypackage=$cwd/lib"

BUILD_PATH_PREFIX_MAP is a documented standard,

[If you were concerned by the $cwd in the example above, I did not mean to imply that BUILD_PATH_PREFIX_MAP should interpret dollar signs. I was just using it as shorthand for a path holding the current working directory.]

As you say, BUILD_PATH_PREFIX_MAP is a documented standard. The standard defines a general kind of mapping but does not specify the areas of application. The document focuses on reproducible builds and using the mappings to map from build-time absolute paths to abstract paths. I call that the "build phase".

But once reproducible artifacts are produced (containing abstract paths), the users of the artifacts need a way to give meaning to the abstract paths. That is also a kind of mapping, so the suggestion is, why not just invert the original mapping? We call the phase where the reproducible artifacts are being used, the "deployment phase".

Over in compiler-land, we've had a bit of discussion about this. See https://github.com/ocaml/ocaml/pull/12126. And a recently merged PR, https://github.com/ocaml/ocaml/pull/12138, added support for logically inverting a mapping.

I don't think we can just modify its meaning and syntax like that. Can you introduce another variable instead?

We are not modifying the syntax, but we are giving it some new semantics for use during the deployment phase. It still is holding mappings from right to left, and with right-most to left-most priority.

The compiler utilities in the Build_path_prefix_map module all take a map as a parameter, so they do not care where the mapping came from. On the other hand, the utilities in Location do look for the BUILD_PATH_PREFIX_MAP environment variable. We could pick a different name for the environment variable when used in the deployment phase, but unless a single tool is doing both build-phase and deployment-phase activities at the same time, I don't see why the same environment variable could not be used for both phases. I'm assuming the BUILD_PATH_PREFIX_MAP environment variable is set freshly each time a tool that needs it is invoked.

Also, could you clarify all the information that ocamldebug wants to look up in the first place. We have three different things:

  • Object files (cmo/cma)
  • Interface files (cmi)
  • Source files (ml, mli)

What exactly does ocamldebug need?

The debug information coming into ocamldebug has, for each compilation unit in the executable:

ocamldebug uses the Load_path module to look things up. It starts it out with the location of the stdlib, and with ".". Then it adds the directories from the debug information. Users can also explicitly add directories to it.

ocamldebug uses the Load_path to find:

The deployment-phase provides some expanded interpretation of the mapping. In particular, Location.rewrite_find_all_existing_dirs.

val rewrite_find_all_existing_dirs: string -> string list
(** [rewrite_find_all_existing_dirs dir] accumulates a list of existing
    directories, [dirs], that are the result of mapping a potentially
    abstract directory, [dir], over all the mapping pairs in the
    BUILD_PATH_PREFIX_MAP environment variable, if any. The list [dirs]
    will be in priority order (head as highest priority).
    The possible results are:
    - [[]], means either
      {ul {- BUILD_PATH_PREFIX_MAP is not set and [dir] is not an existing
      directory, or}
          {- if set, then there were no matching prefixes of [dir].}}
    - [Some dirs], means dirs are the directories found. Either
      {ul {- BUILD_PATH_PREFIX_MAP is not set and [dirs = [dir]], or}
          {- it was set and [dirs] are the mapped existing directories.}}
    - Not_found raised, means some source prefixes in the map
      were found that matched [dir], but none of mapping results
      were existing directories (possibly due to misconfiguration).
      The caller should catch this and issue an appropriate error
      message.
    See
    {{: https://reproducible-builds.org/specs/build-path-prefix-map/ }
    the BUILD_PATH_PREFIX_MAP spec}
    *)

So it is not just the first matching match that causes a rewrite, but we have access to all of the matching rewrites, in priority order.

Suppose that the abstract path to the directory with the contents of library 'lib1' of package pkga is /workspace_root/pkga/liba.

In a typical development situation, a source file in that library, say moda.ml, could be in one of three places:

For the purpose of the finding source, the debugger would like the find it in the source directory preferentially, if it is there (even though it is also in the build directory), but if not in those places, then in the opam/lib directory.

If the debugger instead wanted to find a custom printer '.cmo', then it would want to find it in the build directory, if there, otherwise in the opam/lilb directory.

If the developer's source layout mirrors the installation layout (i.e. package name at the top, then libraries and their contents), then we could define a simple deploy-phase BUILD_PATH_PREFIX_MAP with these mapping pairs (with decreasing priority)"

If the developer does not use that kind of layout, then Dune needs to come up with a mapping during the build phase to ensure that the debug information looks as if that were the layout. For example, if liba is in pkga, but the source layout is

then Dune should use a mapping pair:

This will ensure that build-time path blddir/lib/liba/moda.ml maps to abstract path /workspace_root/pkga/liba/moda.ml. This will ensure that when pkga is installed the debug information will be consistent with its installed location.

I believe the currently Dune maps the current working directory to '/workspace_root'. That would still be appropriate for modules that are not in a package and will not be installed. But for each module that will be installed, a mapping similar to the above is needed.

I'm willing to do the work to make the change if you can give me some hints. At the point where the BUILD_PATH_PREFIX_MAP is being set, how can I find out whether the module belongs to a package, and if so, what package? Or if you or someone else knows what to do, that would be fine.

rgrinberg commented 1 year ago

We could pick a different name for the environment variable when used in the deployment phase, but unless a single tool is doing both build-phase and deployment-phase activities at the same time, I don't see why the same environment variable could not be used for both phases. I'm assuming the BUILD_PATH_PREFIX_MAP environment variable is set freshly each time a tool that needs it is invoked.

Okay, as long as spec-following, existing tools that read BUILD_PATH_PREFIX_MAP aren't going to trip up by this mypackage= syntax, then it's fine by me.

A list of directories of related entites (mostly where source files are located).

Does it account for the kind of name mangling that dune does? E.g. foo.ml can correspond to a compilation unit Lib__Foo?

At the point where the BUILD_PATH_PREFIX_MAP is being set, how can I find out whether the module belongs to a package, and if so, what package? Or if you or someone else knows what to do, that would be fine.

First, you need to know what library the module belongs to. Given that the module is used as a dependency, dune should know the answer to that. From the library, you can determine the package if the package was installed by dune. If it wasn't installed by dune, then you can query opam.

I would suggest to take a look at bin/describe to see what kind of information can easily be reflected from a dune workspace.

richardlford commented 1 year ago

Does it account for the kind of name mangling that dune does? E.g. foo.ml can correspond to a compilation unit Lib__Foo?

It would need to. That is why Dune, which knows about such things, would be needed.

And now that you mention it, is such name mangling documented anywhere? Can you describe its purpose. As I recall, it is related to wrapping multiple modules in another. Is that right?

richardlford commented 1 year ago

I would suggest to take a look at bin/describe to see what kind of information can easily be reflected from a dune workspace.

Thanks for the tip. I've been studying the Dune sources but haven't looked at bin/describe yet. I did notice that the dune-package files have the kind of information that would be needed.

richardlford commented 1 year ago

See also the OCaml issue, https://github.com/ocaml/ocaml/issues/12106, and in particular this comment: https://github.com/ocaml/ocaml/issues/12106#issuecomment-1488864284.

The latter implies that for reproducible builds Dune should also map the location of the opam library.

richardlford commented 1 year ago

I added a PR with a test case for this issue, https://github.com/ocaml/dune/pull/7548.