ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
35.2k stars 2.57k forks source link

Support WebAssembly Reference Types #10491

Open leroycep opened 2 years ago

leroycep commented 2 years ago

WebAssembly Reference Types are supported in most WebAssembly runtimes at the moment, and they make it easier to interoperate with the host runtime.

On the Discord: https://discord.com/channels/605571803288698900/

Stephen Solka#3548 Does zig's wasm target support reference objects? https://github.com/WebAssembly/reference-types/blob/master/proposals/reference-types/Overview.md I tried to figure it out by searching the code base for the code to declare these types externref. I hit this commit that upstreamed this external linker https://github.com/ziglang/zig/commit/f56ae69edd8c96a5f6525f20bf0a22704a826f00 landing in 0.9.0 its not clear if this is exposed at the language level to be used by people using zig for wasm. Im trying to figure out the "right way" to pass JS objects to zig wasm.

Stephen Solka#3548 This is rust's bindgen ref types implementation https://rustwasm.github.io/wasm-bindgen/reference/reference-types.html

Later in the thread: https://discord.com/channels/605571803288698900/922695973623443466/927281011618873407

@Luukdegram Hmm, I'm afraid there's no such thing yet. A lot of the stuff is currently in my head, as I have to implement it for the wasm backend anyway. As LLVM does support this, we could support this once the llvm backend of the selfhosted compiler is finished, which is targeted for 0.10.0. We will have to implement the wasm-specific address spaces though, so that will probably be after 0.10.0.

A roadmap in general is probably a good idea, but the selfhosted compiler is in such a high-speed development stage right now, I'd prefer to wait a bit until we have a more solid base. I'll add this point to my personal TODO 😉

As mentioned by Luuk, this feature will need address spaces to be implemented (see #653).

For now we can pass in reference as integer handles to index into an array or a hashmap.

munrocket commented 2 years ago

@leroycep @kubkon how you expecting this will be looks like in zig?

const Externref = *opaque{}; // like this?
const Externref = anytype; // or this?
// or another syntax?

I am tried this and it's not working right now. Even if externref exist in wasm.zig, it not exist in codegen.zig/air.zig/etc. So basically it's not used right now. However it's exist in LLVM and adding reference types support looks pretty easy.

Here a minimal example that should print WebGLProgram in browser console (right now it's printing 0 if you will run in console npm run build && npm start).

//gl.zig

pub const Externref = *opaque{};

pub const GLenum = u32;
pub const WebGLShader = Externref;
pub const DOMString = [*]const u8;
pub const VERTEX_SHADER: GLenum = 0x8B31;

pub extern "gl" fn createShader(t: GLenum) WebGLShader;
//console.zig

const gl = @import("gl.zig");

pub extern "console" fn log(_: gl.WebGLShader) void;
pub extern "console" fn logF(_: f32) void;
pub extern "console" fn logI(_: c_int) void;
//main.zig

const console = @import("console.zig");
const gl = @import("gl.zig");

export fn main() i32 {

  console.logI(123);
  console.log(gl.createShader(gl.FRAGMENT_SHADER));

  return 0;
}
<!--index.html-->

<!DOCTYPE html>
<html>
<head>
  <link rel="icon" href="data:;base64,iVBORw0KGgo=">
  <title>Test</title>
</head>
<body>
  <canvas id="c"></canvas>
  <script type="module" src="main.js"></script>
</body>
</html>
//main.js

const canvas = document.getElementById('c');
const gl = canvas.getContext('webgl');

const imports = {
  console: {
    log(r) { return console.log(r) },
    logI(i) { return console.log(i) }
  },
  gl: { createShader(t) { return gl.createShader(t); }}
}

WebAssembly.instantiateStreaming(fetch('../main.wasm'), imports).then(obj => {
  const wasm = obj.instance.exports;
  wasm.main();
})
//package.json

{
  "scripts": {
    "build": "zig build-lib main.zig -target wasm32-freestanding -dynamic -OReleaseSmall",
    "start": "npx servez"
  },
  "devDependencies": {
    "servez": "^1.12.1"
  }
}
leroycep commented 2 years ago

The plan is to use address spaces (issue #653) for WASM externrefs. That would look something like this:

// gl.zig

// `.webref` is just a random name I chose, not likely to be the actual thing
pub extern "gl" fn createShader(t: GLenum)  *addrspace(.webref) WebGLShader;
const WebGLShader = opaque{
    pub extern "gl" fn shaderSource(this: *addrspace(.webref) WebGLShader, source: [*]const u8, sourceLen: usize) void;
};
// main.zig

const gl = @import("gl.zig");

const SHADER_SOURCE =
    \\ very clever fragment shader here
;

export fn main() i32 {
  const shader = gl.createShader(gl.FRAGMENT_SHADER);
  shader.shaderSource(SHADER_SOURCE.ptr, SHADER_SOURCE.len);

  return 0;
}

The address space proposal hasn't been finalized, far as I can tell, so it will end up looking a bit different from this.

munrocket commented 2 years ago

Ah I see, you right @leroycep seems that we need addrspace to store externref in global variables, but they not supported right now #4866. Anyway maybe it's possible to store it in table? Here different examples with reference types.

global.wat table_get.wat table_set.wat

WDYT?

Luukdegram commented 2 years ago

@Luukdegram what do you think?

Think of what, exactly? I see a lot of noise here, but no concrete idea of how you want to solve this. There's a lot to consider to fully support this use case:

For the LLVM backend, we can then emit whatever it wants, and do our own thing in the wasm backend. As long as they generate semantically correct behavior we want.

Don't get me wrong. I fully support this use case and would like to see this supported in Zig, but it's not as simple as you seem to portray. I don't think we should rush support for this and should carefully consider all options. Personally, it isn't high on my TODO list right now, as stage2 is far along and I'd like to support Wasm's MVP in the wasm backend before considering the additional proposals and features.

Also, note that I'm not part of the core team. While I can and will provide my input to the core team, I'm in no position to make a decision on this.

munrocket commented 2 years ago

Sorry, I am just tried to fix it by myself (was little bit naive here) and also attached an example that somebody can use as a reference test for implementation.

Use case: I want to make web engine like three.js, that's why I need to make fully compatibe WebAPI for audio, graphics (including new backends) and mouse events. I will do it with codegenerator that can be reused later for another APIs in another zig projects. Linking with C libraries not in a first priority, because right now ecosystem and tooling is more important. For example we also need manually create a glue for fetch/SetTimeout/reqeustAnimationFrame/performance.now().

So the reasons why I am considering Reference Types in zig:

For those who trying to implement glue in old style it will be a x6 more work and will become legacy later.

I fully support this use case and would like to see this supported in Zig, but it's not as simple as you seem to portray.

@Luukdegram thank you for detailed response, you 100% right I am rushed here. But if someone will create experimental version with memory leak it will be helpful, because building ecosystem it's little bit orthogonal work.

Pyrolistical commented 2 years ago

In the meantime, the workaround is the pass an unsecure i32 pointer which is a lookup key in JS land.

codefromthecrypt commented 2 years ago

👍 and while undocumented anywhere as a common practice (AFAICT) this is the way a lot of things do it, regardless of if the host is JS or not. ex say it is a "context" object, there would be a context ID as i32, and the host makes sure this isn't actually mapped to memory, rather a lookup table. That way if some code manipulates it unsafely, they fail to crash anything.

It is still insecure in so far as someone can possibly guess another session's ID, if they are in the same module instance, but then again wasm modules are not safe for concurrent use and removing context (clearing the key and the memory) before adding one back to the pool can prevent leaks.

Take above as grain of salt because I don't work in wasm security, just things I noticed in how things work outside JS.

gcoakes commented 2 years ago

I've got a few questions about this ticket:

  1. Is address space fully implemented? It's parsed and fed into LLVM as far as I can tell. Considering that #653 is still open, I'm uncertain if it is complete.
  2. Is someone already working on it?
  3. Does my general battle plan seem correct?
    1. Rename std.wasm.Valtype to NumericType.
    2. Create std.wasm.ValueType as a tagged union of std.wasm.RefType and the above (leaving the possibility for VectorType in the future).
    3. Replace all uses of Valtype with the above union.
    4. Add all Reference Instructions^1 to src/arch/wasm/Mir.zig.
    5. Add .host to std.builtin.AddressSpace.
    6. In src/arch/wasm/CodeGen.zig, convert *addrspace(.host) anyopaque to ValueType{.RefType = .externref} anywhere it might be found (function params, instructions).
  4. Where should I be looking in the stage1 compiler in order to make these changes?

(Please don't take this as a commitment to actually implement it. This isn't my day job, and my attention span for hobby work tends to be short.)

Luukdegram commented 2 years ago

I've got a few questions about this ticket:

  1. Is address space fully implemented? It's parsed and fed into LLVM as far as I can tell. Considering that more pointer metadata: address spaces #653 is still open, I'm uncertain if it is complete.
  2. Is someone already working on it?
  3. Does my general battle plan seem correct?

    1. Rename std.wasm.Valtype to NumericType.
    2. Create std.wasm.ValueType as a tagged union of std.wasm.RefType and the above (leaving the possibility for VectorType in the future).
    3. Replace all uses of Valtype with the above union.
    4. Add all Reference Instructions1 to src/arch/wasm/Mir.zig.
    5. Add .host to std.builtin.AddressSpace.
    6. In src/arch/wasm/CodeGen.zig, convert *addrspace(.host) anyopaque to ValueType{.RefType = .externref} anywhere it might be found (function params, instructions).
  4. Where should I be looking in the stage1 compiler in order to make these changes?

(Please don't take this as a commitment to actually implement it. This isn't my day job, and my attention span for hobby work tends to be short.)

Footnotes

  1. https://webassembly.github.io/spec/core/syntax/instructions.html#reference-instructions

Before answering your questions, I'd like to bring to your attention that no decision has been made yet with regard to the syntax or whether it's even possible to integrate the external reference feature into Zig at all. Such a decision isn't very straightforward as there are many cases to consider before this can be accepted. e.g. what should be the behavior when someone tries to @ptrCast such a type? Therefore, implementing this right now is not possible and is also the reason why the work hasn't been started yet. However, I'll still happily answer your questions:

  1. Address spaces are not fully implemented yet. However, some work has been done to support certain use cases.
  2. No; see my remark above.
  3. Your plan seems to target the native WebAssembly backend. This backend will only be used for debug mode in the future. It's also incomplete right now, which means it isn't being used outside of implementing the backend. Instead, this should probably be implemented in the LLVM backend first. The battleplan does have the basics correct for the native WebAssembly backend, but is still missing many edge cases such as updating other instructions to use is_null for example when the type is an External Reference.
  4. It is not worthwhile to implement this in the stage1 compiler. Stage2 is the new default, and stage1 will be removed in the future. Address spaces also aren't implemented at all within the stage1 compiler.
gcoakes commented 2 years ago

Thank you for your response.

this should probably be implemented in the LLVM backend first

Took me an hour or so to realize this... I implemented a naive version of the native changes and was very confused until I noticed one little line where it switched to the LLVM backend.

I have some notes I've taken since my last comment that I don't want to go to waste. @Luukdegram, though I suspect I'm just telling you things you already know, I hope someone will find them useful:

(module
    (func (export "eq_ref") (param externref externref) (result i32)
        local.get 0
        local.get 1
        eq
    )
)
jedisct1 commented 2 years ago

An RFC to support Reference Types in Clang was just published.

As well as an implementation.