Open jonahbeckford opened 11 months ago
> with-dkml file dune-project dune-project: Unicode text, UTF-16, little-endian text, with CRLF line terminators
If this is accurate, then the BOM is only the start; I don't think Dune is able to decode UTF-16 text at the moment.
I imagine this is an issue for regular dune files as well. In any case, it seems reasonable for us to handle it.
Okay, there are two separate problems, both exposed on Windows. The first is UTF-16 encoding is not yet supported. The second is that BOM is not supported, even in UTF-8.
For example:
> with-dkml file dune-project
dune-project: Unicode text, UTF-8 (with BOM) text, with no line terminators
> with-dkml od -xc dune-project
0000000 bbef 28bf 616c 676e 6420 6e75 2065 2e33
357 273 277 ( l a n g d u n e 3 .
0000020 3231 0029
1 2 )
0000023
> dune build
File "dune-project", line 1, characters 0-19:
1 | (lang dune 3.12)
^^^^^^^^^^^^^^^^^^^
Error: Invalid first line, expected: (lang <lang> <version>)
We will need an UTF decoder. The best one is in the standard library, but only on 4.14 and later.
Does OCaml even compile with utf16 .ml files? Most other compilers I know, gcc
, rustc
don't accept any other encoding than utf8. Only ones I am aware of are Microsoft compilers and Java. It doesn't seem like a bug but more like a feature request to accept dune files in utf16 encoding.
Also from a cross platform perspective, if you want the dune files to work on other platforms, anything other than utf8 seems like a foot gun.
I've read that powershell allows you to configure the default piping output from utf16 to utf8. Apparently that is also the default on newer versions.
I'm tempted to say that echo is not the correct tool in this case, and that you should be passing PS arguments fixing the encoding. If this is for scripts, then this is probably fine, but for users, I wouldn't recommend writing Dune files this way and simply using an editor would be an improvement.
In the mean time, this is a good opportunity to improve the error message and say that we only accept utf8. Also updating the docs is a good idea.
Does OCaml even compile with utf16 .ml files?
I never even considered that. After testing UTF-16 ... no, it does not. But neither does it work with BOM-encoded UTF-8 ... which is valid UTF-8.
Anyway, Dune should not need to support encodings that OCaml does not support. Sadly, I can't find a specification for what encoding OCaml supports! I vaguely remember some thread that OCaml doesn't support UTF-8 source files ... I think it was only ISO 8859-1 or some Latin variant.
Flow chart:
dune init prj
works, and no need for Windows users to install VS Code simply to get a tiny project started with the right file encodings). See https://youtu.be/33niX94tn3U?si=4xpOe6cSvW_dKGE1&t=605 (10:05 - 10:30) for how it is ugly today for Windows users without Visual Studio Code.Aside: C has no spec for source code encoding, so gcc
expecting UTF-8 makes sense. rustc
was a better example. But more popular languages (Javascript, C#, Python and as you mentioned Java) do support other source code encodings.
Sadly, I can't find a specification for what encoding OCaml supports! I vaguely remember some thread that OCaml doesn't support UTF-8 source files ... I think it was only ISO 8859-1 or some Latin variant.
OCaml has no spec for source code encoding: source file contents are interpreted as raw bytes. There is however an ongoing discussion to require UTF-8: see https://github.com/ocaml/ocaml/pull/1802 and the links mentioned there.
Okay ... will have you all on the Dune team decide whether UTF-8 (including optional BOM) is what Dune supports.
We'll support whatever OCaml does. I agree with your statement here:
Anyway, Dune should not need to support encodings that OCaml does not support
Edits
Expected Behavior
In PowerShell on Windows:
I would expect the project to build.
Actual Behavior
The reason? Windows conventionally has a byte-order mark in its Unicode files. The built-in PowerShell (or 5.x or less) inserts the BOMs; newer Command Prompts do not.
Specifications
dune
(output ofdune --version
): 3.12.1