dunhamsteve commented 3 weeks ago

Description

Currently in Idris, we can't add a new backend to an existing %foreign definition without editing the library containing the definition. This issue came up for me when I wanted to use Control.App in node and needed to stub out prim__fork for the node codegen. But it also arises when working with a third party backend.

In a discord discussion @stefan-hoeck suggested the following syntax to add additional bindings to an existing %foreign definition:

%foreign_impl Prelude.IO.prim__putStr "javascript:lambda:x=>..."

This implementation of that syntax follows the pattern of %foreign and allows an indented block of expressions that evaluate to strings. The test case uses the pragma to make Control.App work with the node codegen.

The general idea is to update the definition of the %foreign function to add the new bindings. We store the additional information in Options, accumulate it as TTC files are read in, and then update the defs in Compiler.Common. We can't directly modify the definition when the pragma is evaluated because the context is dropped after compiling the module.

Should this change go in the CHANGELOG?

[x] If this is a fix, user-facing change, a compiler change, or a new paper implementation, I have updated CHANGELOG_NEXT.md (and potentially also CONTRIBUTORS.md).

stefan-hoeck commented 3 weeks ago

Very nice! Thanks for taking the time to implement this.

dunhamsteve commented 3 weeks ago

Currently, it just aggregates the directives. The most recently loaded module is first. I was on the fence on whether this should be an error as I can see a case for overriding the declaration in the library and possibly a case for a backend wanting more than one declaration for a given name. So I took the simpler path and the handling of duplicates is left to the backend.

From auditing the code (I haven't experimented to verify):

The scheme and RefC backends go through the list of supported backends in order and pick the first match in the list. So it may pick the first "scheme,chez" and if there is none, it will pick the first "scheme", etc. This happens in parseCC.
- Chez looks for: ["scheme,chez", "scheme", "C__collect_safe", "C"]
- Racket looks for: ["scheme,racket", "scheme", "C"]
- RefC looks for: ["RefC","C"]
Javascript goes through the list and picks the first one that matches any supported backend (supported backends are either ["node","javascript"] or ["browser","javascript"]). This happens in searchForeign.

There is a little bit of ambiguity here, if two TTC are loaded that have an override for the same backend, the behavior depends on the load order. I do check that the name we override/augment is a %foreign, ignoring visibility, and give an error if it is not a foreign function.

Aside from what is already implemented, if we want to detect these conflicts, we could either give a warning or an error when there is a duplicate. We could check:

When processing the pragma
- This will miss the cases where two conflicting TTC are loaded independently
When loading the TTC file and merging it's foreignImpl into the context
- This approach gets us the name of the module with the conflicting definition
When merging the foreign_impl into the context in Compiler.Common.addForeignImpl
When a backend processes the list of bindings
- The backends currently choose the first match, but they could be changed
- This allows a backend to decide what to accept
- We don't have FC on the individual bindings

mattpolzin commented 3 weeks ago

I think this capability may be too useful not to merge even with some shortcomings (though non-determinism is particularly unfortunate). Long term, I wonder if foreign functions would be more ergonomic if an error for ambiguity could be given and a hide directive could be used to address that error.

dunhamsteve commented 3 weeks ago

For now, I have added documentation for how duplicate declarations are handled.

Note that it is also possible to introduce ambiguity with the existing compiler by doing:

%foreign "javascript:foo"
         "javascript:bar"
something : Nat -> Nat

idris-lang / Idris2

Add %foreign_impl pragma for augmenting ffi functions #3303

Description

Should this change go in the CHANGELOG?