ocaml / dune

A composable build system for OCaml.
https://dune.build/
MIT License
1.57k stars 395 forks source link

melange: introduce `melange.root` stanza, to avoid duplicated .js artifacts #7017

Open jchavarri opened 1 year ago

jchavarri commented 1 year ago

At the moment, every melange.emit stanza generates its own version of the js artifacts produced by the libs it consumes. It there are N melange.emit stanzas using the same library foo, the js artifacts for foo will be found N times under the _build folder.

Besides unnecessary disk space usage, this is problematic for build tools like webpack, that become slower as they have to process the same files multiple times. It is also punitive for the resulting JavaScript bundle size, as it is not possible out of the box to reuse the same library js bundle in multiple places, unless one goes going through complex webpack configurations.

To avoid this, @rgrinberg suggested the idea of introducing a new stanza melange.root. This new stanza would define a target, and the javascript_extension and module_system used by that target, e.g.:

(melange.root
 (target tools_es6)
 (module_system es6)
 (javascript_extension js))

Then, afaiu, melange.emit stanzas that are spread across the project folders would be used to define "entry points", referencing the target already defined in melange.root, e.g.

(melange.emit
 (target tools_es6)
 (alias melange)
 (libraries lib))

The fields module_system and javascript_extension would be removed from melange.emit.

Some open questions:

jchavarri commented 1 year ago

A couple more open questions:

rgrinberg commented 1 year ago

Should melange.root be in dune or dune-project file?

Definitely in the dune file. Things shouldn't be in the dune-project unless for very concrete reasons.

Is it possible to solve the duplication problem without adding a new stanza?

It is possible with a workspace scanning step before the rules are loaded. In other words, not very appealing because it doesn't scale well with larger workspaces.

If melange.root was ultimately added, would it make sense to rethink stanza naming? e.g. melange.emit could be rather melange.entry_point or melange.executable, as it would not be really emitting anything anymore. And melange.root could be melange.emit instead.

Sounds fine to me.

Would target still be strictly needed in melange.emit? It is nice to reduce the amount of js targets to what the user strictly needs (e.g. tests use commonjs, source code uses es6), but the alternative (produce js targets for all the melange.root stanzas found) is also possible.

It would no longer be needed.

I am not sure I get how rules production would work with melange.root. If user calls dune build @melange in the example shared in description, dune would find the alias in melange.emit, then start looking from the melange.emit folder up the tree, until it finds a melange.root stanza, then produce the rules in that folder, right? I am not sure about the implications with copy_files and include_subdirs 🤔

Do you mean the rules for the .cmj or the rules for .js artifacts?

Thinking about this a little more, I'm not so sure we need melange.root at all. Wouldn't it be possible to cover your use case by:

  1. creating a library for every single "entry point" application
  2. writing a single melange.emit stanza with all the libraries that correspond to entry points.
jchavarri commented 1 year ago

Thinking about this a little more, I'm not so sure we need melange.root at all. Wouldn't it be possible to cover your use case by:

  1. creating a library for every single "entry point" application
  2. writing a single melange.emit stanza with all the libraries that correspond to entry points.

This ended up working for Ahrefs use case 🎉 However, maybe it would still be beneficial to create a new stanza, or rather split melange.emit into two stanzas.

Currently, melange.emit has two roles:

The idea would be to split the current roles of melange.emit into two stanzas:

It's unclear if the complexity is worth the effort though, as "abusing" library to use them in both cases (proper libraries but also entry points) seems to work fine.