Open richardlford opened 1 year ago
BUILD_PATH_PREFIX_MAP="/workspace_root/mypackage=$cwd/lib"
BUILD_PATH_PREFIX_MAP
is a documented standard, I don't think we can just modify its meaning and syntax like that. Can you introduce another variable instead?
Also, could you clarify all the information that ocamldebug wants to look up in the first place. We have three different things:
What exactly does ocamldebug need?
BUILD_PATH_PREFIX_MAP="/workspace_root/mypackage=$cwd/lib"
BUILD_PATH_PREFIX_MAP
is a documented standard,
[If you were concerned by the $cwd in the example above, I did not mean to imply that BUILD_PATH_PREFIX_MAP should interpret dollar signs. I was just using it as shorthand for a path holding the current working directory.]
As you say, BUILD_PATH_PREFIX_MAP
is a documented standard. The standard defines a general kind of mapping but does not specify the areas of application. The document focuses on reproducible builds and using the mappings to map from build-time absolute paths to abstract paths. I call that the "build phase".
But once reproducible artifacts are produced (containing abstract paths), the users of the artifacts need a way to give meaning to the abstract paths. That is also a kind of mapping, so the suggestion is, why not just invert the original mapping? We call the phase where the reproducible artifacts are being used, the "deployment phase".
Over in compiler-land, we've had a bit of discussion about this. See https://github.com/ocaml/ocaml/pull/12126. And a recently merged PR, https://github.com/ocaml/ocaml/pull/12138, added support for logically inverting a mapping.
I don't think we can just modify its meaning and syntax like that. Can you introduce another variable instead?
We are not modifying the syntax, but we are giving it some new semantics for use during the deployment phase. It still is holding mappings from right to left, and with right-most to left-most priority.
The compiler utilities in the Build_path_prefix_map module all take a map as a parameter, so they do not care where the mapping came from. On the other hand, the utilities in Location do look for the BUILD_PATH_PREFIX_MAP environment variable. We could pick a different name for the environment variable when used in the deployment phase, but unless a single tool is doing both build-phase and deployment-phase activities at the same time, I don't see why the same environment variable could not be used for both phases. I'm assuming the BUILD_PATH_PREFIX_MAP environment variable is set freshly each time a tool that needs it is invoked.
Also, could you clarify all the information that ocamldebug wants to look up in the first place. We have three different things:
- Object files (cmo/cma)
- Interface files (cmi)
- Source files (ml, mli)
What exactly does ocamldebug need?
The debug information coming into ocamldebug has, for each compilation unit in the executable:
ocamldebug uses the Load_path module to look things up. It starts it out with the location of the stdlib, and with ".". Then it adds the directories from the debug information. Users can also explicitly add directories to it.
ocamldebug uses the Load_path to find:
The deployment-phase provides some expanded interpretation of the mapping. In particular, Location.rewrite_find_all_existing_dirs
.
val rewrite_find_all_existing_dirs: string -> string list
(** [rewrite_find_all_existing_dirs dir] accumulates a list of existing
directories, [dirs], that are the result of mapping a potentially
abstract directory, [dir], over all the mapping pairs in the
BUILD_PATH_PREFIX_MAP environment variable, if any. The list [dirs]
will be in priority order (head as highest priority).
The possible results are:
- [[]], means either
{ul {- BUILD_PATH_PREFIX_MAP is not set and [dir] is not an existing
directory, or}
{- if set, then there were no matching prefixes of [dir].}}
- [Some dirs], means dirs are the directories found. Either
{ul {- BUILD_PATH_PREFIX_MAP is not set and [dirs = [dir]], or}
{- it was set and [dirs] are the mapped existing directories.}}
- Not_found raised, means some source prefixes in the map
were found that matched [dir], but none of mapping results
were existing directories (possibly due to misconfiguration).
The caller should catch this and issue an appropriate error
message.
See
{{: https://reproducible-builds.org/specs/build-path-prefix-map/ }
the BUILD_PATH_PREFIX_MAP spec}
*)
So it is not just the first matching match that causes a rewrite, but we have access to all of the matching rewrites, in priority order.
Suppose that the abstract path to the directory with the contents of library 'lib1' of package pkga
is /workspace_root/pkga/liba
.
In a typical development situation, a source file in that library, say moda.ml
, could be in one of three places:
opamroot/lib/pkga/liba/moda.ml
.opam/lib
.For the purpose of the finding source, the debugger would like the find it in the source directory preferentially, if it is there (even though it is also in the build directory), but if not in those places, then in the opam/lib
directory.
If the debugger instead wanted to find a custom printer '.cmo', then it would want to find it in the build directory, if there, otherwise in the opam/lilb
directory.
If the developer's source layout mirrors the installation layout (i.e. package name at the top, then libraries and their contents), then we could define a simple deploy-phase BUILD_PATH_PREFIX_MAP with these mapping pairs (with decreasing priority)"
If the developer does not use that kind of layout, then Dune needs to come up with a mapping during the build phase to ensure that the debug information looks as if that were the layout. For example, if liba
is in pkga
, but the source layout is
then Dune should use a mapping pair:
This will ensure that build-time path blddir/lib/liba/moda.ml
maps to abstract path /workspace_root/pkga/liba/moda.ml
. This will ensure that when pkga
is installed the debug information will be consistent with its installed location.
I believe the currently Dune maps the current working directory to '/workspace_root'. That would still be appropriate for modules that are not in a package and will not be installed. But for each module that will be installed, a mapping similar to the above is needed.
I'm willing to do the work to make the change if you can give me some hints. At the point where the BUILD_PATH_PREFIX_MAP is being set, how can I find out whether the module belongs to a package, and if so, what package? Or if you or someone else knows what to do, that would be fine.
We could pick a different name for the environment variable when used in the deployment phase, but unless a single tool is doing both build-phase and deployment-phase activities at the same time, I don't see why the same environment variable could not be used for both phases. I'm assuming the BUILD_PATH_PREFIX_MAP environment variable is set freshly each time a tool that needs it is invoked.
Okay, as long as spec-following, existing tools that read BUILD_PATH_PREFIX_MAP
aren't going to trip up by this mypackage=
syntax, then it's fine by me.
A list of directories of related entites (mostly where source files are located).
Does it account for the kind of name mangling that dune does? E.g. foo.ml
can correspond to a compilation unit Lib__Foo
?
At the point where the BUILD_PATH_PREFIX_MAP is being set, how can I find out whether the module belongs to a package, and if so, what package? Or if you or someone else knows what to do, that would be fine.
First, you need to know what library the module belongs to. Given that the module is used as a dependency, dune should know the answer to that. From the library, you can determine the package if the package was installed by dune. If it wasn't installed by dune, then you can query opam.
I would suggest to take a look at bin/describe to see what kind of information can easily be reflected from a dune workspace.
Does it account for the kind of name mangling that dune does? E.g.
foo.ml
can correspond to a compilation unitLib__Foo
?
It would need to. That is why Dune, which knows about such things, would be needed.
And now that you mention it, is such name mangling documented anywhere? Can you describe its purpose. As I recall, it is related to wrapping multiple modules in another. Is that right?
I would suggest to take a look at bin/describe to see what kind of information can easily be reflected from a dune workspace.
Thanks for the tip. I've been studying the Dune sources but haven't looked at bin/describe yet. I did notice that the dune-package files have the kind of information that would be needed.
See also the OCaml issue, https://github.com/ocaml/ocaml/issues/12106, and in particular this comment: https://github.com/ocaml/ocaml/issues/12106#issuecomment-1488864284.
The latter implies that for reproducible builds Dune should also map the location of the opam library.
I added a PR with a test case for this issue, https://github.com/ocaml/dune/pull/7548.
Expected Behavior
Consider a simple project with one library and a single package,
mypackage
.Here is the source layout:
Here is an excerpt from the dune-project file:
And here is the "lib/dune" file:
I did a
dune install --prefix _opamroot
and here is an excerpt of the layout it produced:Now, when the debugger is debugging a program that used this library, we would want the debug information from the library to be:
/workpace_root/mypackage/lib1
so that after the debugger does a mapping from
/workspace_root
toopamroot/lib
, that the directory will becomeopamroot/lib/mypackage/lib1
which is where the source code and '.cma' file are located.
Actual Behavior
The following debug directories were produced in the 'lib1.cma':
The problem is that the debug information is missing information about which package contains the library. When building a module that is part of a package that will be potentially installed, the mapping should include the package name, and also be as if the library location was at the root of the package. So the
BUILD_PATH_PREFIX_MAP
in this case should be:BUILD_PATH_PREFIX_MAP="/workspace_root/mypackage=$cwd/lib"
Reproduction
I will open a PR soon, but I think the above describes the issue adequately. Also, I plan to work on a fix for this Dune problem, and am also working on a Dune enhancement to start up
ocamldebug
with aBUILD_PATH_PREFIX_MAP
to maps from/workspace_root
to the actual source locations.Specifications
dune
: 3.7.0ocaml
: 5.0.0