ocaml-sys / config.ml

conditional compilation via attributes for OCaml
Other
27 stars 4 forks source link

OCaml 5 requirement #17

Closed jonahbeckford closed 4 weeks ago

jonahbeckford commented 7 months ago

Context: In DkCoder https://discuss.ocaml.org/t/ann-dkcoder-0-1-0/14327 I'm exposing OS/ABI from https://github.com/diskuv/dkml-c-probe as environment variables to scripts, and I've been planning on exposing a cfg mechanism.

For a lot of reasons 1 I'll have to fork but thought I would ask regardless. Is there something about config that needs OCaml 5.1+?

1 OCaml 5 / dkml-c-probe / whitelist control over envvars / restriction to modules for integration with codept / ...

linear[bot] commented 7 months ago

RIOT-24 OCaml 5 requirement

leostera commented 7 months ago

Hi @jonahbeckford! Thanks for opening the issue ✨

We can loosen the restriction down to OCaml LTS (4.14.2 I think?).

Re: the other things, I think these are reasonable feature requests too. Specifically allow-listing env vars (or just plain integration with .env files) I'd be very open to.

Could you help me understand what you need for dkml-c-probe and the module restriction?

jonahbeckford commented 7 months ago

dkml-c-probe

dkml-c-probe is a fundamental building block for my Android/Apple cross-compilers and a lot of DkML infrastructure. For example, several teams use dkml-workflows to test MSVC, and those ABI names like windows_x86_64, linux_x86 and android_arm32v7a come from dkml-c-probe. Someone asked me to add in more BSD flavors so it is being used elsewhere as well.

Mitigation: I am fine with a new version of dkml-c-probe that also emits Rust target_os, target_arch and target_family values ... it is extensible and always backwards-compatible. But having the DkML ABI available is make-or-break.

jonahbeckford commented 7 months ago

module restriction

I use codept as a base for security features in DkCoder. The mental model is that code like:

module A = struct
  let f () = 3
  module B = struct end
end

let (_ : int) = A.f ()

module C = struct end

gets stripped of all values and classes so that all that remains are modules and module types:

module A = struct
  module B = struct end
end
module C = struct end

I use codept before compilation to quickly analyze which modules are used. Missing modules are auto-downloaded from packages if the package has been blessed by me or is listed by the user in a project file (convenience), and compilation is aborted if any modules are used that don't match a policy (security). Either way, the analysis part is based on modules only.

PPX-es like your config PPX still run first on the AST, but as you can guess all non-module extension points will be lost.

Anyway, it is quite technically possible to keep the cfg extension on AST values in addition to AST module structures. But it will be quite confusing for the user since the module-level cfg has significantly different benefits and side-effects (like cfg conditionally downloading missing modules). I'd much rather fork than make it more difficult for users to have a consistent mental model of what is happening.

Mitigation: I have one super PPX that aggregates an "official" (unchanging) set of PPX-es available to DkCoder users. I do this because I'm opinionated towards user simplicity, and technically because Dune does not support (preprocessing (pps ...)) in a pure bytecode environment. So ... since I'm fully in control of the arguments passed to PPX-es that are compiled with DkCoder, I can add whatever Ppxlib.Driver.add_arg are necessary for the config PPX.

leostera commented 7 months ago

Interesting! Thanks for sharing :)

I am fine with a new version of dkml-c-probe that also emits Rust target_os, target_arch and target_family values ... it is extensible and always backwards-compatible. But having the DkML ABI available is make-or-break.

I'd be open to defining some set of aliases so that target_arch could be fed from a different environment source. I think I'd prefer this to be defined in a Config.toml file, but I'm open to other options that don't impact the DevEx.

But it will be quite confusing for the user since the module-level cfg has significantly different benefits and side-effects (like cfg conditionally downloading missing modules). I'd much rather fork than make it more difficult for users to have a consistent mental model of what is happening.

If i understand correctly, you want to avoid your users from writing code like:

module A = struct
  (* ... *)
end
[@@cfg disabled]

module B = struct
  (* code here that depends on A *)
end

since if cfg does in fact include the module A you'll just use it, but if it doesn't include it and it is also used in B and if the module A belongs to some list of blessed modules, it will be downloaded.

how would you expect cfg to behave so that this mental model would be more straightforward?

jonahbeckford commented 7 months ago

Actually, I am fine with that example of yours. It works for two reasons:

  1. User perspective: The @@cfg attribute/extension is applied to the module, and that is all I'd ever expect a DkCoder user to understand with @@cfg: a module is there, or it is not.
  2. Technical perspective: The codept analysis can only see modules, so @@cfg on modules is good.
(* good *)
module A = struct
  (* ... *)
end
[@@cfg any(target_os = "windows")]

In contrast:

  1. User perspective: Applying @@cfg to let definitions, types and record fields IMHO fall into the category of allowing several different ways of doing the same thing. Not good when my minimum bar is high schoolers. (Even @@config versus @@cfg has to be explained to someone reading the code, and the less explaining the better.) They can all be done with a module level @@cfg instead. (See below)
  2. Technical perspective: The codept analysis can only see modules (and module types), so @@cfg on let definitions, types and record fields is invisible to my security analysis and can't participate in the module downloading.
(*  Will remove the following section of code because:
    a) Redundant for user
    b) Invisible to codept analysis *)

let msg () = "Hi. Welcome to your Windows program."
[@@cfg any(target_os = "windows")]

type filepath = { unc_path: string; canonical_path: string; dos_8_3_path: string option }
[@@cfg any(target_os = "windows")]

let msg () = "Hi. Welcome to your Unix program."
[@@cfg any(not(target_os = "windows"))]

type filepath = { path: string }
[@@cfg any(not(target_os = "windows"))]

(* Instead ... just use @@cfg with modules.
    Side-benefit: The code is clearer / easier to read. *)

module Filesystem = struct
    let msg () = "Hi. Welcome to your Windows program."
    type filepath = {
        unc_path: string; (** [ \\servername\path ] *)
        canonical_path: string; (** [ C:\Program Files\Adobe ] *)
        dos_8_3_path: string option (** [ C:\Program~1\Adobe ] *) }

    (* this [include] is module-level so alternatively could place @cfg
       on the [include] *)
    include Os_windows
end
[@@cfg any(target_os = "windows")]

module Filesystem = struct
    let msg () = "Hi. Welcome to your Unix program."
    type filepath = { path: string }

    include Os_unix
end
[@@cfg any(not(target_os = "windows"))]

Also I don't think module file exclusions will work in DkCoder:

(* If the module is visible on the filesystem, it will be visible to ocaml tools like codept.
    Does this really translate to an empty module? *)
[@@@cfg (should_include = "no")]

For clarity about what I mean by "fork", I'm being super-conservative and only including portions of two PPX-es today:

  1. a small subset of the ppx deriving (just show, eq and ord)
  2. ocaml-monadic which itself is a subset of ppx_let from Jane Street. I can't use Jane Street b/c they are explicit they don't test+support Windows and also b/c https://discuss.ocaml.org/t/ann-v0-16-release-of-jane-street-packages/12398 seems to imply OCaml 4.x will be dropped at v0.17/v0.18

In both cases I am making it possible to re-use the OCaml code outside of DkCoder by using a strict subset of what is available today. My expectation is to do the same with cfg, hopefully with the use of ppx arguments I supply (ie. not mandating user write a .toml or .json config file) so I don't have to do a real fork.

If you want, I can submit a PR for ppx arguments that restrict the feature set of cfg (plus add whitelist + environment overrides for target_os/target_arch). (It won't be soon, but I'll get to it).

leostera commented 4 weeks ago

@jonahbeckford did you end up using config.ml? I think that PR would make sense :)

jonahbeckford commented 4 weeks ago

Oops. I forgot to follow up with this. I'm still in design, but I'm fairly sure I'll end up with a pure module approach with no annotations for conditional modules. Just a naming convention for conditional modules and functor arguments. That fits better with the direction I am going with DkCoder.

Sorry for taking your time. This was fruitful though!