miking-lang / miking

Miking - the meta viking: a meta-language system for creating embedded languages
Other
51 stars 31 forks source link

Call For Requirements: MExpr Externals System #832

Open johnwikman opened 7 months ago

johnwikman commented 7 months ago

Following recent Miking meeting discussions about the new system for externals, we decided on opening this GitHub issue to allow for async and offline discussion about what we want for the new externals system.

The idea would be that we all contribute with requirements that we would like to start with. Then we sift through the list to get a better idea of what we want from MExpr's external system.

This issue only considers the MExpr aspect of the externals system. MLang and other high-level DSLs may be used for motivation, but the proposed design should be focused on MExpr only. To avoid further bikeshedding, the focus is on requirements of the new externals system rather than its syntax.

Each suggested requirement must include at least the following (not necessarily in bullet form):

  1. A description of the requirement.
  2. A motivation of why this should be a requirement. This should be present regardless of how obvious it may seem. The motivation may be presented as a descriptive text, a motivating example, or a combination of the two.

Suggest one or more requirements as a response to this issue. I will then add a link to that requirement here in the list below. A response can contain multiple requirements, in which I will add multiple links to that same response (one link per requirement).

List of collected requirements:

  1. All necessary information for an external call should be given as input to the compiler at compile/evaluation time
  2. Syntax of declared externals should not assume any specific backend
  3. Default/fallback MCore implementations of externals
  4. Semantic specifications for Externals
  5. Ability to reason about external definitions as intrinsic functions
  6. Convey type information to and from external types
johnwikman commented 7 months ago

Here are some basic proposed requirements from my side to get started with.

All necessary information for an external call should be given as input to the compiler at compile/evaluation time

A user should be able to declare an external function call in MExpr without having to rebuild the compiler or having to otherwise change a global configuration of the compiler. Some motivations for this:

Externals should be declared as expressions in the compiled source program itself (in the compiled MExpr AST). Certain platform specific aspects such as library linking paths may be more suited as compile-time flags or environment variables rather than as information in the program.

Syntax of declared externals should not assume any specific backend

As a framework, Miking should be able to target multiple backends. One key idea that has been discussed is that MExpr should have a stable syntax. But at the same time MExpr should be able to add support for multiple backends as time progresses, hence the syntax of the declared externals should not rely on the targeted backend.

johnwikman commented 7 months ago

Another requirement that I would like to add is that there should be a possibility to add some "default" fallback for defined externals. As an example, if I want to include BigInt functionality in a C target then I probably want to use the gmp library. For backends that don't support BigInt, then we could implement the functionality directly in MCore to at least provide the necessary functionality without having to modify the compiled program.

This also puts some restrictions on the external type here, since the representation of BigInt has to be chosen depending on the implementation used.

Another motivating example is to provide efficient implementations for special operations. Suppose that we have an external type Tensor and an external type NDarray, which both implements tensor2seq, seq2tensor, ndarray2seq, and seq2ndarray. We can use these to implement a conversion tensor2ndarray. This works but is likely very costly. Certain backends may have very efficient translation between Tensor and NDarray, and hence we could use that directly instead if available.

aathn commented 6 months ago

Another good thing about "default" implementations for externals is that they could act as semantic specifications that backends should conform to. That's a feature (semantic specifications for externals) that I think would be good to have in one form or another

br4sco commented 6 months ago

One use case I would like to highlight is where you need to reason about the external as just another function in your AST. In particular, when implementing AD or partial evaluation, you want to lift elementary functions to dual numbers (or similar), or statically evaluate the function. It is, of course, possible to match on identifiers, but that is a bit inconvenient compared to matching on Const constructors.

johnwikman commented 6 months ago

One use case I would like to highlight is where you need to reason about the external as just another function in your AST. In particular, when implementing AD or partial evaluation, you want to lift elementary functions to dual numbers (or similar), or statically evaluate the function. It is, of course, possible to match on identifiers, but that is a bit inconvenient compared to matching on Const constructors.

Could you clarify what you mean about just another function in the AST? Do you mean as an intrinsic function, a lambda expression, or something else?

br4sco commented 6 months ago

One use case I would like to highlight is where you need to reason about the external as just another function in your AST. In particular, when implementing AD or partial evaluation, you want to lift elementary functions to dual numbers (or similar), or statically evaluate the function. It is, of course, possible to match on identifiers, but that is a bit inconvenient compared to matching on Const constructors.

Could you clarify what you mean about just another function in the AST? Do you mean as an intrinsic function, a lambda expression, or something else?

Yes, I mean intrinsic function.

johnwikman commented 6 months ago

One use case I would like to highlight is where you need to reason about the external as just another function in your AST. In particular, when implementing AD or partial evaluation, you want to lift elementary functions to dual numbers (or similar), or statically evaluate the function. It is, of course, possible to match on identifiers, but that is a bit inconvenient compared to matching on Const constructors.

Could you clarify what you mean about just another function in the AST? Do you mean as an intrinsic function, a lambda expression, or something else?

Yes, I mean intrinsic function.

An interesting aspect with this is how this may generalize externals for interpreted mode and boot. But should probably leave this kind of discussion for when we start with the design.

johnwikman commented 2 months ago

One things that would be useful when working with external types would be to be able to allow external types to also contain other Miking types.

As an example, suppose I want to define this external "multi-device" array type:

type MultideviceArray k = "<external type definition here>"

let mdaMake = lam n. "<external function definition here>"

with the following underlying implementation in C

int mda_make(struct mdaGeneric *mda, size_t elemsize, size_t n)
{
    mda->n = n;
    mda->elemsize = elemsize;
    mda->data = mda_malloc(n * elemsize);
    if (mda->data == NULL)
        return -1

    return 0;
}

In this case, the MultideviceArray type somehow needs to have access to size properties of the contained type k to allocate sufficient memory.

I'm thinking that these features would be more on the optional side. Such that types (internal and externals) doesn't have to support information if it doesn't make sense for that type. For example:

-- This is a valid type, since the storage properties of Int32 are well defined!
-- (Int32 in this case would be another external type.)
let foo : MultideviceArray Int32 = ... in

-- This is NOT a valid type, since the storage properties of symbols are opaque.
let bar : MultideviceArray Symbol = ... in   -- TYPE ERROR

-- This could be a valid type, since the storage properties of Float can be well-defined for a specific instance of Miking.
let foo : MultideviceArray Float = ... in