ocaml / dune

A composable build system for OCaml.
https://dune.build/
MIT License
1.57k stars 396 forks source link

Feature: DUNE_PROJECT_SOURCE_PREFIX_MAP #10178

Open jonahbeckford opened 4 months ago

jonahbeckford commented 4 months ago

This is already implemented in https://github.com/jonahbeckford/dune/commit/812af6c58603ba77c7e26d4711f8ae1f189b441e. I'd like to get a feature review (and I have to wait for other things before I submit a PR for this). I can also demo this at a later date to the Dune team.

Desired Behavior

The DUNE_PROJECT_SOURCE_PREFIX_MAP environment variable behaves similarly to the build path prefix map. However, it focuses on source path modifications rather than build path modifications.

The use cases I have for this are:

  1. (today) autogenerating the dune build files into a single git-ignored directory tree, for a tidy project hygiene. (It would be nicer if that directory tree didn't have to exist at all, but Dune's Path.t abstraction forces dune files to be in the project directory). Those dune build files have (copy) stanzas to the original source code.
  2. (soon) Major transformations of the .ml files. For example, a system (being implemented right now) where naming a file xyz_windows_x86_64.ml would be considered to be xyz.ml for a Windows 64-bit build. This is just a slight extension to the (copy) of the first use case, where filenames are adjusted during the (copy).

This commit focuses on the Merlin experience.

Example

Given a tiny project:

<project>/src/X/A.ml
<project>/dune-project <-- auto-generated and git-ignored
<project>/#s/X/dune <-- auto-generated and git-ignored
<project>/_build/default/#s/X/A.ml <-- (copy) of src/X/A.ml

Merlin should see a Merlin configuration for src/X/A.ml even though the (executable) was for #s/X/A.ml.

A #s=src DUNE_PROJECT_SOURCE_PREFIX_MAP accomplishes that.

Notice a few things:

  1. The user only needs to enter src/X/A.ml. No dune file needs to be written.
  2. Obvious drawback that there is an extra copy.
  3. As the project grows large with .ml files, the generated dune files all stay in a single tree #s that is git-ignored.
  4. I haven't shown it in that example but my use-cases are multiple script "roots" (like src/, tests/, assets/bitmaps/, etc.) to hold content.
rgrinberg commented 4 months ago

If I'm getting this right, the major use case for this feature is to provide your own metadata that desugars into dune and dune-project files. If so, I don't think this proposal is enough because the error messages emitted by dune would still contain the generated paths in the error messages.

Major transformations of the .ml files. For example, a system (being implemented right now) where naming a file xyz_windows_x86_64.ml would be considered to be xyz.ml for a Windows 64-bit build. This is just a slight extension to the (copy) of the first use case, where filenames are adjusted during the (copy).

Don't we already cover this? At least partially. Dune does try to feed the original source to merlin after the copy stanzas have been resolved.

jonahbeckford commented 4 months ago

My sequence of steps is helpful in my reply below:

  1. First did a code review. Discovered that (copy#) should influence Merlin.
  2. Tried using (copy#). Saw that it worked with adjusting the Dune error messages (with caveats mentioned below). That was an unexpected but pleasant surprise. However, Dune/Merlin was reporting that No config found for file xxxx. Try calling 'dune build'.
  3. Did a lot of Format.eprintf debugging. Discovered gaps primarily in ocaml_merlin.ml:load_merlin_file.
  4. Wrote the first commit of the DUNE_PROJECT_SOURCE_PREFIX_MAP implementation.
  5. Asked for a feedback review before I possibly waste time completing the DUNE_PROJECT_SOURCE_PREFIX_MAP implementation ... if the team thinks the overall idea is bad.

(me) This commit focuses on the Merlin experience.

If so, I don't think this proposal is enough because the error messages emitted by dune would still contain the generated paths in the error messages.

Yes, error messages are not covered by the first commit. However, I wasn't trying to submit a full PR.

Side-note. As mentioned already, using (copy#) "fixes" the error messages for Dune. However, that has had undesirable side-effects:

  1. (Critical) The character positions in the error locations are not correct. It is quite jarring when the locations don't match up in an editor.
  2. (Less important) On Windows the backslashes in the error locations are incorrectly rendered as double backslashes (ex. C:\\Test\\Foo.txt). Eventually I'd like to use double backslashes correctly ... using double backslashes in Windows UNC paths like \\.\C:\Test\Foo.txt avoids the 255 max length FAT32-based file path.

So yes, a second commit to adjust the error message paths using DUNE_PROJECT_SOURCE_PREFIX_MAP would fix the problems above.


Don't we already cover this? At least partially. Dune does try to feed the original source to merlin after the copy stanzas have been resolved.

As suggested by my sequence of steps, whatever code was there was not sufficient for Merlin to work.

More generally, the mechanism of adding a source-code-location header line seems very likely to break PPX location reporting.

Perhaps I'm wrong though ... so asking the experts!

jonahbeckford commented 4 months ago

the major use case for this feature is to provide your own metadata that desugars into dune and dune-project files

More than that. I'll make a full announcement hopefully within a week. But the major use case is providing an OCaml scripting experience that a) does not need require the user to write any configuration files, and b) can work in vscode with the vscode-ocaml-platform plugin without the user pre-installing OCaml + C compilers. The "a)" is why I need something like DUNE_PROJECT_SOURCE_PREFIX_MAP (and codept dependency analysis to discover metadata). I was going to ditch Dune entirely for this, but the first Dune-backed implementation is looking cleaner than I first suspected. And using Dune makes it easy to publish these script-based projects to an opam repository.

rgrinberg commented 4 months ago

Thanks for describing your use cases more thoroughly. This proposal tries to kill two birds with one stone by proposing a single construct both fixes and introduces a new feature. While that might be admirable, I actually think that it's not the way to go. As I've mentioned before, it's only a partial implementation of the desired feature. And the bug that it fixes, well I think that it should be fixed unconditionally without any manual intervention from users.

So I think we need to get to the bottom of why merlin is broken. AFAIK, it should respect the location directives (right @voodoos ?).

Your feature request is valid, but it needs more design to make sure the issues of locations is covered. At minimum, this could be covered by introducing location directives into the dune language so that you can make dune point back to your original metadata

Although if I understand the purpose of your project correctly, you're only using dune as an implementation detail. If that's the case, perhaps you should just consider forking dune and ripping our frontend and reading the build metadata in whatever way that works for you.

More generally, the mechanism of adding a source-code-location header line seems very likely to break PPX location reporting.

That's funny, because the very purpose of these header lines was to accommodate preprocessors. If they don't work for that, what good are they for then I wonder.

jonahbeckford commented 4 months ago

Good points. Thanks for the feedback. I'll continue maintaining my patch in a fork until the underlying bugs are fixed.

Things to clarify:

you're only using dune as an implementation detail

I decided to use dune as my first implementation because it works well in the ocaml ecosystem, especially for publishing to opam, but also for generating native code executables and for using in an opam-monorepo style project (which I use extensively for cross-compiling). Those are features, not implementation details. That doesn't mean I won't also have a ninja-generator at some point, but that ninja-generator will complement not replace the need for a dune-generator.

Forking will make the above very difficult. But I can wait since I don't need those features today, and hopefully after the community announcement others can live without those features for a while. More importantly I want to get the integration done right, so thanks again for the feedback.

First bug

So I think we need to get to the bottom of why merlin is broken. AFAIK, it should respect the location directives

Agree. But the first breakage is happening inside Dune.

The relevant project structure is:

$ tree . -I Xyz -I "*002*" -I "*200*" -I "*300*" -I dune-project -I "dk*" -I "*.md" -I cmake -I log -I install -I ".git*" -I .dune -I .lock -I byte -I .vscode -I .db -I .digest-db -I .filesystem-clock -I "*.eobjs" -I "META.*" -I "*.install" -I "*.dune-package" -a
.
├── #s
│   └── DkHelloScript_Std
│       └── dune
├── _build
│   └── default
│       └── #s
│           └── DkHelloScript_Std
│               ├── .merlin-conf
│               │   └── exe-Example001
│               ├── Example001.bc
│               ├── Example001.ml
│               └── Example001.mli
└── src
    └── DkHelloScript_Std
        └── Example001.ml

with

; #s/DkHelloScript_Std/dune
(rule
  (target Example001.ml)
  (action (
    copy#
    "Y:\\source\\DkHelloScript\\src\\DkHelloScript_Std\\Example001.ml"
    %{target})))
(executable
  (name    Example001)
  (modes   byte)
  (modules Example001))
(install
  (package DkHelloScript_Std_Example001)
  (section bin)
  (files (Example001.bc as DkHelloScript_Std_Example001.bc)))

The following is essentially the call when I hover over src/DkHelloScript_Std/Example001.ml:

let file = {|_build/default/src/DkHelloScript_Std/Example001.ml|}
let _ = Ocaml_merlin.load_merlin_file file

The search pattern in Ocaml_merlin.load_merlin_file searches the following directories with get_merlin_files_paths:

find_closest path=_build/default/src/DkHelloScript_Std/
find_closest path=_build/default/src/
find_closest path=_build/default/
find_closest path=_build/

But it should be doing:

let file = {|_build/default/#s/DkHelloScript_Std/Example001.ml|}
let _ = Ocaml_merlin.load_merlin_file file

so that the search pattern is:

find_closest path=_build/default/#s/DkHelloScript_Std/
   ==>
   get_merlin_files_paths > find_map file_path=_build/default/#s/DkHelloScript_Std/.merlin-conf/exe-Example001

Second bug

The second breakage is in Merlin. The contents of _build\default\#s\DkHelloScript_Std\.merlin-conf\exe-Example001 are:

((STDLIB Y:/source/dksdk-coder/.ci/o/dkml/lib/ocaml)
 (EXCLUDE_QUERY_DIR)
 (B
  "Y:\source\DkHelloScript\_build/default/#s/DkHelloScript_Std/.Example001.eobjs/byte")
 (S "Y:\source\DkHelloScript\#s/DkHelloScript_Std")
 (FLG
  (-w
   @1..3@5..28@30..39@43@46..47@49..57@61..62@67@69-40
   -strict-sequence
   -strict-formats
   -short-paths
   -keep-locs
   -g)))

Edit:

If I don't change the (S "Y:\source\DkHelloScript\#s/DkHelloScript_Std") statement to (S "Y:\source\DkHelloScript\src/DkHelloScript_Std") I get No config found for file ....

In my first commit I do that with a post-processing sexp transformation Ocaml_merlin.invert_merlin_source_dirs. But I don't think post-processing is correct ... but I don't really understand the root issue for this one.

jonahbeckford commented 3 months ago

Breaking this issue apart ...

So I think we need to get to the bottom of why merlin is broken. AFAIK, it should respect the location directives

The problem is exposed at the bottom of the test/blackbox-tests/test-cases/merlin/merlin-locate.t/run.t test I added in #10381:

https://github.com/ocaml/dune/blob/2f47fed767aa191a9d8fc448b940c32421acf959/test/blackbox-tests/test-cases/merlin/merlin-locate.t/run.t#L39-L51

If possible, please check if it is a real problem and if so an issue can be cut to Merlin.