poorna2152 / nballerina

WebAssembly Backend for the nBallerina compiler
https://ballerina.io/
Apache License 2.0
4 stars 0 forks source link

Representing `any` in `WASM` #13

Closed poorna2152 closed 2 years ago

poorna2152 commented 2 years ago

any in subset 2 represents either an integer or a boolean. Boolean is represented as an i32 in WASM and Integer as an i64. Thus any should map to an i64. When any is assigned a boolean it should be converted to an i64. i64.extend_i32_u can be used to extend an unsigned i32 to i64.

poorna2152 commented 2 years ago

AssemblyScript does not provide support for any or union types.

However we can do this in AssemblyScript.

export function add32(x: i32, y: i32): i32 {
  return add(x, y);
}

export function add64(x: i64, y: i64): i64 {
  return add(x, y);
}

export function add<T>(a: T, b: T): T {
  return a + b;
}

The different types of T that the add can be called with need to be known at the compile time. So it can create functions for each of those types.

(func $add<i64> (param $0 i64) (param $1 i64) (result i64)
  local.get $0
  local.get $1
  i64.add
)
(func $add<i32> (param $0 i32) (param $1 i32) (result i32)
  local.get $0
  local.get $1
  i32.add
)

Assignability: https://www.assemblyscript.org/types.html#assignability This too seems like a compile time feature

poorna2152 commented 2 years ago

Javascript can be compiled to WASM using nectar js. This produces a WASM file and when that WASM was converted to WAT it did not provide useful information on how the types were handled.

poorna2152 commented 2 years ago

Current implementation:

Example:

import ballerina/io;
public function main() {
    foo(57); // @output 57
    foo(()); // @output 
    foo(9223372036854775807); // @output 9223372036854775807
}

function foo(any x) {
    io:println(x);
}
(module
  (import "console" "log" (func $println (param i64))) 
  (memory $0 1 256) 
  (global $offset (mut i32) (i32.const 0)) 
  (export "memory" (memory $0)) 
  (export "main" (func $main)) 
  (export "foo" (func $foo)) 
  (func $main 
     (local $0 i64) 
     (local $1 i64) 
     (local $2 i64) 
     (block 
       (call $foo 
         (call $int_to_tagged 
           (i64.const 57))) 
       (call $foo 
         (i64.const 2305843009213693952)) 
       (call $foo 
         (call $int_to_tagged 
           (i64.const 9223372036854775807))) 
       (return))) 
  (func $foo (param $0 i64) 
     (local $1 i64) 
     (block 
       (call $println 
         (local.get $0)) 
       (return))) 
  (func $int_to_tagged (param $0 i64) (result i64) 
     (local $1 i32) 
     (if 
       (i32.and 
         (i64.gt_s 
           (local.get $0) 
           (i64.const -36028797018963968)) 
         (i64.lt_s 
           (local.get $0) 
           (i64.const 36028797018963967))) 
       (return 
         (i64.or 
           (i64.or 
             (i64.and 
               (local.get $0) 
               (i64.const 72057594037927935)) 
             (i64.const 2305843009213693952)) 
           (i64.const 504403158265495552))) 
       (block 
         (i64.store 
           (global.get $offset) 
           (local.get $0)) 
         (local.set $1 
           (global.get $offset)) 
         (global.set $offset 
           (i32.add 
             (global.get $offset) 
             (i32.const 8))) 
         (return 
           (i64.or 
             (i64.const 504403158265495552) 
             (i64.extend_i32_u 
               (local.get $1)))))))) 

println definition in Javascript:

     console: {
        log: function(arg) {
          if (Number(arg & IMMEDIATE_FLAG) != 0) {
            console.log(getImmediateValue(arg).toString());
          }
          else {
            let loc = Number(arg & ((2n**32n) - 1n));
            let x = (BigInt(memory[2*loc + 1]) << 32n) | (BigInt(memory[2*loc]));
            console.log(x.toString())
          }
        }
      },
poorna2152 commented 2 years ago

Possible method:

poorna2152 commented 2 years ago

From discussion: It is okay to allocate space in memory if WASM gc takes care of it. Else we have to do gc ourselves or find a better way of representing any and unions.

manuranga commented 2 years ago

It is okay to allocate space in memory if WASM gc takes care of it.

It will not be GCed by the WASM engine.

Else we have to do gc ourselves

We prefer not to do this.

Plan is to look at https://github.com/WebAssembly/gc and fine a strategy that will work when gc is available in WASM engine. We'll also have to see if there is a reference impl of this if not we have to find/impl a polyfill scheme for us to continue development.

manuranga commented 2 years ago

Please take a look at https://github.com/WebAssembly/gc/issues/130#issuecomment-1029368340

poorna2152 commented 2 years ago

From what I understood,

poorna2152 commented 2 years ago

Therefore my initial idea was to,

  1. Use the tagged representation as of nBallerina and when the value need to be Boxed (e.g.: integer which cannot be represented in 56 bits) then store it in a struct and store the struct in memory. Here any in Ballerina would map to a i64 type in WASM whose value is either an immediate value or a pointer to memory. Since struct is GC, this would be a solution to the problem. But it seems like a struct cannot be stored in memory. (No instruction to store a struct in memory).

  2. The other option we can do is to use the i31 and struct type. Tagging functions would convert an i64 to either a i31 or struct. Both i31 and struct types are subtypes of ref type. When the value we want to convert can be represented using 31 bits or less then we can use the i31 or else we need to use the struct type. Thus any in Ballerina would map to a ref type in WASM.

poorna2152 commented 2 years ago

So out of from the 31 bits we have,

poorna2152 commented 2 years ago
poorna2152 commented 2 years ago

As suggested representing Booleans with i31ref, ints with structs and null with null ref. For the following program,

import ballerina/io;

public function main() {
    any x = true;
    any y = 21;
    any z = ();
    io:println(x);
    io:println(y);
    io:println(z);
}

WAT

(module
  (type $BoxedInt (struct (field $val i64)))
  (import "console" "log" (func $println (param anyref)))
  (export "main" (func $main))
  (export "tagged_to_int" (func $tagged_to_int))
  (export "tagged_to_boolean" (func $tagged_to_boolean))
  (export "get_type" (func $get_type))
  (func $main
    (local $0 anyref)
    (local $1 anyref)
    (local $2 anyref)
    (block
      (local.set $0
        (call $boolean_to_tagged
          (i32.const 1)))
      (local.set $1
        (call $int_to_tagged
          (i64.const 21)))
      (call $println
        (local.get $0))
      (call $println
        (local.get $1))
      (call $println
        (local.get $2))
      (return)))
  (func $int_to_tagged (param $0 i64) (result (ref $BoxedInt))
    (return
      (struct.new_with_rtt
        $BoxedInt
        (local.get $0)
        (rtt.canon $BoxedInt))))
  (func $tagged_to_int (param $0 anyref) (result i64)
    (return
      (struct.get
        $BoxedInt
        $val
        (ref.cast
          (ref.as_data
            (local.get $0))
          (rtt.canon $BoxedInt)))))
  (func $boolean_to_tagged (param $0 i32) (result i31ref)
    (return
      (i31.new
        (local.get $0))))
  (func $tagged_to_boolean (param $0 anyref) (result i32)
    (return
      (i31.get_u
        (ref.as_i31
          (local.get $0)))))
  (func $get_type (param $0 anyref) (result i32)
    (if
      (ref.is_i31
        (local.get $0))
      (return 
        (i32.const 1))
      (if
        (ref.is_null
          (local.get $0))
        (return 
          (i32.const 0))
        (return 
          (i32.const 2))))))

Using null ref to represent null would mean that corresponding variable will not be initialized. (In the above program $2 register is not initialized).Shouldn't it be preferred to have a separate value which represents null. (i31ref value for null).