poorna2152 / nballerina

WebAssembly Backend for the nBallerina compiler
https://ballerina.io/
Apache License 2.0
4 stars 0 forks source link

String representation in wasm #26

Closed poorna2152 closed 2 years ago

poorna2152 commented 2 years ago
poorna2152 commented 2 years ago

There are array operation,

However these operations are not defined in the MVP document and they don't work when run with the NodeJs recommended version 16.14. They require NodeJS version 17 up to work. But there are issues in gc repo which these operations are mentioned.

These operations if used make string operations and constructions easier as well as simplify previously defined list operations. Should they be used?

poorna2152 commented 2 years ago

When using Javascript string type, Assuming JS object with following properties would be used to store a string,

{
    value: // stores the string,
    surrogate: [] // stores the indices of surrogates
}

wasm

(module 
  (type $String (struct (field $val (mut externref)))) 
  (import "string" "create" (func $str_create (param eqref) (result externref))) 
  (func $main 
    (local $0 eqref) 
    (local.set $0
      (array.init $chars
        (i32.const 97) ;; "abc"
        (i32.const 98) 
        (i32.const 99) 
        (rtt.canon $chars)))
    (call $println 
      (struct.new_with_rtt $String
        (call $str_create
          (local.get $0))
        (rtt.canon $String))))) 

JS

const create_string = (ref) => {
  var chars = [];
  let length = strLen(ref);
  let surrogate = []
  for (let index = 0; index < length; index++) {
    let char = getCharAt(ref, index);
    chars.push(char);
    // check for surrogate
  }
  var bytes = new Uint8Array(chars);
  var string = new TextDecoder('utf8').decode(bytes);
  return {
    value: string,
    surrogate: surrogate
  };
}
poorna2152 commented 2 years ago
manuranga commented 2 years ago

I don't think you need to convert, I think we should be able to relay on the node's ability to read utf-8 source files. We should try to put the string as is eg (data (i32.const 0x42) "Hello, Reference Types!\n") see ref: https://bytecodealliance.org/articles/reference-types-in-wasmtime

manuranga commented 2 years ago

Oh wait, above seem to be available as UTF-8 not UTF-16, hmm, let me think about it.

poorna2152 commented 2 years ago

Data section stores stuff in memory isn't it. Wouldn't we would loose the gc if we do that.

poorna2152 commented 2 years ago

Steps in my initial attempt for initializing a string const in wasm,

What has been done in most of the tutorials to initialize a string const is to use the data section of the wasm. This stores the string in linear memory and this stored value is not garbage collected. However we could use that as a static memory of the program. https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format#webassembly_memory

(data (i32.const 0x42) "Hello, Reference Types!\n")

In this approach Strings in wasm would still be represented using a wasm struct which stores a reference to a JS string. Main steps are,

@jclark can you give a comment regarding these approaches for initialization of strings in wasm?

poorna2152 commented 2 years ago

In the last meeting what I said about the externref and eqref not being subtypes is wrong. Variable of type externrefcould be assigned to a variable of type anyrefor eqref. (not vice versa) I am currently storing the returned externref from JS inside a struct of type $String. Since I am storing the externref inside a struct of type String I can get the type as String.

poorna2152 commented 2 years ago
import ballerina/io;

public function main() {
    string y = "b";
    foreach int i in 0...9 {
        y = y + "a";
    }
    io:println(y);
}

If used my initial method, this would create a same string "a" again and again in the loop body. For one iteration of the loop for creating a string this,

If we are to use the table approach. (A table can only store a externrefor a funcref. So can only store the returned externrefnot the created struct). If we create strings once beforehand and store the externrefs in a table. Then for a each iteration of the above loop we have to

The only difference between the table approach and the initial one is the repeated JS call. Instead of doing this can't we rely on V8 to do the optimizations and do the initial approach

manuranga commented 2 years ago

Can we create globle veriables for each string (eg for "a" we shall create $bal$str$0) and set the value to the goble veriable ones at the beginning of the program and reuse it elsewhere.

poorna2152 commented 2 years ago

I compared my initial approach and the table approach performance using hyperfine with the above program. Table performs better. For loops with 10 and 1000 iterations table performed 1.11 times better than the direct approach and when I ran 100000 iterations table performed 3.6 times better.

poorna2152 commented 2 years ago

I also think it is possible to do with global variables. Let me test it and check.

manuranga commented 2 years ago

In globle veriables approch you can keep the struct insted externref.

poorna2152 commented 2 years ago

I ran using global by storing the struct in a global.

(global $bal$str0 (mut eqref) (ref.null data))
(global $bal$str1 (mut eqref) (ref.null data)) 

(global.set $bal$str0
  (struct.new_with_rtt $String 
    (call $str_create 
      (i32.const 1) 
      (i32.const 1))
    (rtt.canon $String))) 

Still the result produced by hyperfine showed that the table ran 2.3 times faster when run for 100000 iterations and for 1000 iterations they ran at the same speed. (table 1.02 times better)

poorna2152 commented 2 years ago

I am initializing the global variables and the table slots in the beginning of the main function

manuranga commented 2 years ago

That is unexpected. We also need to test cases where surrogate array is not empy. I assume globle variables are faster at least in those.

poorna2152 commented 2 years ago

They seem to be running on the same speed now. Above may be a mistake from my end. For 10,

'node --experimental-wasm-eh --experimental-wasm-gc main.js table.wasm --import' ran
    1.01 ± 0.07 times faster than 'node --experimental-wasm-eh --experimental-wasm-gc main.js global.wasm --import'

For 100000,

'node --experimental-wasm-eh --experimental-wasm-gc main.js global.wasm --import' ran
    1.00 ± 0.04 times faster than 'node --experimental-wasm-eh --experimental-wasm-gc main.js table.wasm --import'
poorna2152 commented 2 years ago

Choosing to go with the globals.

All the public functions in a Ballerina Program are exported from the wasm module. Thus any public function can be called from the Javascript side. But currently only the main function is called from JS. So the global variables are initialized in the main function. If another public function is called from the JS side without calling the main function first global variables will not be initialized. As a solution,

  1. Since Ballerina program invokes only if a main function is called keep the implementation as it is.
  2. Else create a separate function (init_strings) which can initialize the globals and create function calls to this function within every public function in the module. Ensure that the init_strings function is only called once using a global variable. I feel like option 1 is what needs to happen.
poorna2152 commented 2 years ago

It is possible to directly import the String.prototype class as a whole and access its functions in wasm. However we have to add functions like str_create(which takes a offset and length and create a string), string_equal and string_compare functions to String.prototype to make it complete.

poorna2152 commented 2 years ago

In the meeting as I understood what was suggested was to,

If we are to use that approach since tables can only store externrefs surrogate also needs to be an externref as well. Storing surrogates as external ref would mean that for accessing elements in surrogate and for retrieving surrogate length we have to depend on a Javascript functions (need to import them). This means that there would be more wasm-JS calls.

Since we already have a array representation in wasm isnt it preferable to use our wasm array representation for storing the surrogate.

What I has in mind is to keep storing Strings as structs using globals. This struct contains two fields, (type $String (struct (field $val (mut externref) (field $surrogate (mut (ref array))))) val field: which stores the JS string as a externref surrogate stored as a wasm array.

Import the String.prototype into wasm. Define the string functions as wasm functions and use the imported Functions from String.prototype for this.

(module 
  (import "string" "length" (func $js_str_length (param externref) (result i64))) 
  (func $str_length (param $0 eqref) (result i32) 
    (return 
      (i32.sub
        (call $js_str_length  ;; get JS string length
          (struct.get $String $val
            (local.get $0)))
        (call $arr_len ;; get surrogate array length
          (struct.get $String $surrogate 
            (local.get $0)))))))
poorna2152 commented 2 years ago

$String type in wasm is a struct with four fields,