Closed poorna2152 closed 2 years ago
There are array operation,
array.init
: initialize an array with values when creating an array. (without this need to create an array first and then add elements one by one)
https://github.com/WebAssembly/binaryen/blob/main/test/heap-types.wast#L383array.copy
: copy contiguous part of one array to another. (useful when extending the array, string concatenation, substrings)
https://github.com/WebAssembly/binaryen/blob/main/test/heap-types.wast#L374However these operations are not defined in the MVP document and they don't work when run with the NodeJs recommended version 16.14. They require NodeJS version 17 up to work. But there are issues in gc repo which these operations are mentioned.
These operations if used make string operations and constructions easier as well as simplify previously defined list operations. Should they be used?
When using Javascript string type, Assuming JS object with following properties would be used to store a string,
{
value: // stores the string,
surrogate: [] // stores the indices of surrogates
}
Creating string: When creating a string, pass the codepoints of each character to an imported Javascript function (create_string
) as a wasm array. Javascript creates a string using the codepoints, it stores the indices where a surrogate is needed. Returns the created JS object and it is stored in a struct in wasm.
(type $String (struct (field $val (mut externref))))
Define string functions using Javascript string functions and import them to wasm. (need to consider the surrogates in defining string functions. e.g.: when using length function)
wasm
(module
(type $String (struct (field $val (mut externref))))
(import "string" "create" (func $str_create (param eqref) (result externref)))
(func $main
(local $0 eqref)
(local.set $0
(array.init $chars
(i32.const 97) ;; "abc"
(i32.const 98)
(i32.const 99)
(rtt.canon $chars)))
(call $println
(struct.new_with_rtt $String
(call $str_create
(local.get $0))
(rtt.canon $String)))))
JS
const create_string = (ref) => {
var chars = [];
let length = strLen(ref);
let surrogate = []
for (let index = 0; index < length; index++) {
let char = getCharAt(ref, index);
chars.push(char);
// check for surrogate
}
var bytes = new Uint8Array(chars);
var string = new TextDecoder('utf8').decode(bytes);
return {
value: string,
surrogate: surrogate
};
}
I don't think you need to convert, I think we should be able to relay on the node's ability to read utf-8 source files.
We should try to put the string as is eg (data (i32.const 0x42) "Hello, Reference Types!\n")
see ref: https://bytecodealliance.org/articles/reference-types-in-wasmtime
Oh wait, above seem to be available as UTF-8 not UTF-16, hmm, let me think about it.
Data section stores stuff in memory isn't it. Wouldn't we would loose the gc if we do that.
Steps in my initial attempt for initializing a string const in wasm,
UTF-8
code points of each character
(array.init $chars ;; string: "abc"
(i32.const 97)
(i32.const 98)
(i32.const 99)
(rtt.canon $chars)))
What has been done in most of the tutorials to initialize a string const is to use the data section of the wasm. This stores the string in linear memory and this stored value is not garbage collected. However we could use that as a static memory of the program. https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format#webassembly_memory
(data (i32.const 0x42) "Hello, Reference Types!\n")
In this approach Strings in wasm would still be represented using a wasm struct which stores a reference to a JS string. Main steps are,
@jclark can you give a comment regarding these approaches for initialization of strings in wasm?
In the last meeting what I said about the externref
and eqref
not being subtypes is wrong.
Variable of type externref
could be assigned to a variable of type anyref
or eqref
. (not vice versa)
I am currently storing the returned externref from JS inside a struct of type $String
. Since I am storing the externref inside a struct of type String I can get the type as String.
import ballerina/io;
public function main() {
string y = "b";
foreach int i in 0...9 {
y = y + "a";
}
io:println(y);
}
If used my initial method, this would create a same string "a" again and again in the loop body. For one iteration of the loop for creating a string this,
If we are to use the table approach. (A table can only store a externref
or a funcref
. So can only store the returned externref
not the created struct
).
If we create strings once beforehand and store the externrefs in a table. Then for a each iteration of the above loop we have to
The only difference between the table approach and the initial one is the repeated JS call. Instead of doing this can't we rely on V8 to do the optimizations and do the initial approach
Can we create globle veriables for each string (eg for "a"
we shall create $bal$str$0
) and set the value to the goble veriable ones at the beginning of the program and reuse it elsewhere.
I compared my initial approach and the table approach performance using hyperfine with the above program. Table performs better. For loops with 10 and 1000 iterations table performed 1.11 times better than the direct approach and when I ran 100000 iterations table performed 3.6 times better.
I also think it is possible to do with global variables. Let me test it and check.
In globle veriables approch you can keep the struct insted externref.
I ran using global by storing the struct in a global.
(global $bal$str0 (mut eqref) (ref.null data))
(global $bal$str1 (mut eqref) (ref.null data))
(global.set $bal$str0
(struct.new_with_rtt $String
(call $str_create
(i32.const 1)
(i32.const 1))
(rtt.canon $String)))
Still the result produced by hyperfine showed that the table ran 2.3 times faster when run for 100000 iterations and for 1000 iterations they ran at the same speed. (table 1.02 times better)
I am initializing the global variables and the table slots in the beginning of the main function
That is unexpected. We also need to test cases where surrogate array is not empy. I assume globle variables are faster at least in those.
They seem to be running on the same speed now. Above may be a mistake from my end. For 10,
'node --experimental-wasm-eh --experimental-wasm-gc main.js table.wasm --import' ran
1.01 ± 0.07 times faster than 'node --experimental-wasm-eh --experimental-wasm-gc main.js global.wasm --import'
For 100000,
'node --experimental-wasm-eh --experimental-wasm-gc main.js global.wasm --import' ran
1.00 ± 0.04 times faster than 'node --experimental-wasm-eh --experimental-wasm-gc main.js table.wasm --import'
Choosing to go with the globals.
All the public functions in a Ballerina Program are exported from the wasm module. Thus any public function can be called from the Javascript side. But currently only the main function is called from JS. So the global variables are initialized in the main function. If another public function is called from the JS side without calling the main function first global variables will not be initialized. As a solution,
init_strings
) which can initialize the globals and create function calls to this function within every public function in the module. Ensure that the init_strings
function is only called once using a global variable.
I feel like option 1 is what needs to happen.It is possible to directly import the String.prototype
class as a whole and access its functions in wasm. However we have to add functions like str_create
(which takes a offset and length and create a string), string_equal
and string_compare
functions to String.prototype
to make it complete.
In the meeting as I understood what was suggested was to,
If we are to use that approach since tables can only store externrefs surrogate also needs to be an externref as well. Storing surrogates as external ref would mean that for accessing elements in surrogate and for retrieving surrogate length we have to depend on a Javascript functions (need to import them). This means that there would be more wasm-JS calls.
Since we already have a array representation in wasm isnt it preferable to use our wasm array representation for storing the surrogate.
What I has in mind is to keep storing Strings as structs using globals. This struct contains two fields, (type $String (struct (field $val (mut externref) (field $surrogate (mut (ref array))))) val field: which stores the JS string as a externref surrogate stored as a wasm array.
Import the String.prototype into wasm. Define the string functions as wasm functions and use the imported Functions from String.prototype for this.
(module
(import "string" "length" (func $js_str_length (param externref) (result i64)))
(func $str_length (param $0 eqref) (result i32)
(return
(i32.sub
(call $js_str_length ;; get JS string length
(struct.get $String $val
(local.get $0)))
(call $arr_len ;; get surrogate array length
(struct.get $String $surrogate
(local.get $0)))))))
$String type in wasm is a struct with four fields,
(type $String (struct (field $type i32) (field $val (mut anyref)) (field $surrogate (ref $Surrogate)) (field $hash (mut i32))) (extends $Any))
Strings initialized as wasm global
.