servo / servo

Servo, the embeddable, independent, memory-safe, modular, parallel web rendering engine
https://servo.org
Mozilla Public License 2.0
27.87k stars 2.98k forks source link

Implement ReadableStream support #21482

Closed jdm closed 4 years ago

jdm commented 6 years ago

This requires updating the WebIDL parser at minimum, and treating it as a *mut JSObject for the code generation.

jdm commented 6 years ago

https://searchfox.org/mozilla-central/source/dom/fetch/Fetch.cpp is a good place to use to see examples of C++ code using ReadableStream APIs that the JS engine provides.

pshaughn commented 4 years ago

24876 relates to this and is probably a prerequisite for it

pshaughn commented 4 years ago

Some WPT tests want to see the ReadableStream constructor itself, in addition to the various ones that expect a body.getReader().

gterzian commented 4 years ago

@jdm could you please provide more details on what you mean with:

  1. updating the WebIDL parser
  2. treating it as a *mut JSObject for the code generation

I've taken a look at bindings/codegen and it's not immediately clear to me where one would start to make those changes. Do we have example of PR's involving similar work?

On another note, I was wondering if a WebIDL/codegen is necessary, because SM gives use those low-level APIs at https://github.com/servo/mozjs/blob/e21c05b415dfc246175ff8d5fc48b0e8c5b4e9e9/mozjs/js/public/Stream.h and those API all operate on things like HandleObject or JsObject, so I was wondering if we actually need a stream as a native Rust object?

On the other hand, I do think we will need to implement various ReadableStreamUnderlyingSource in Rust, however for that I think we need to take a similar approach to implementing JobQueue, using a new C++ class in rust-mozjs like RustJobQueue

jdm commented 4 years ago

WebIDL and codegen are only necessary for any WebIDL that includes ReadableStream. If you're only concentrating on using SM-based streams inside the engine and not exposing them to JS, then they aren't necessary.

If WebIDL changes are desired, then the initial codegen can expose the stream objects as *mut JSObject and HandleObject like you suggest.

gterzian commented 4 years ago

Ok I'm confused by this one.

So normally, we use a .webidl file for the interface between the Rust struct, and the JS. So then the code is actually written in Rust, and there is a layer of binding between the Rust and SpiderMonkey.

So I get that(I think).

However this time, the actual code is already implemented by SM. Yet will still need:

  1. a ReadableStream() constructor being available to the JS(it seems this isn't the case now, unlike other JS objects which have a constructor without requiring a webidl).
  2. We want to be able to implement our own streams, where the Rust code acts as the underlying source(see the apis offered by SM at https://github.com/servo/mozjs/blob/e21c05b415dfc246175ff8d5fc48b0e8c5b4e9e9/mozjs/js/public/Stream.h#L29)

So, for 1, how can we use the code from SM found over at https://github.com/servo/mozjs/blob/e21c05b415dfc246175ff8d5fc48b0e8c5b4e9e9/mozjs/js/src/builtin/Stream.h (note this is the src folder, not the public one linked to earlier), yet have a webidl?

For 2, I think we can just have a C++ class binding to a Rust struct(like is done with https://github.com/servo/rust-mozjs/blob/9b0d063ba062f4cc60c3bab9250218d6935d647b/src/jsglue.cpp#L42), which would implement the ReadableStreamUnderlyingSource interface, and then directly use the various APIs made available by SM?

gterzian commented 4 years ago

For example, if I do new Promise(function() {}) in Servo, it works, yet if I do new ReadableStream(), I get ReadableStream is not defined.

Why does it require a webild, if SM is already implementing it?

gterzian commented 4 years ago

Ah actually we do have a Promise Webidl and a Rust struct... who knew!

gterzian commented 4 years ago

Ok this is what I will try to do:

  1. Setup a ReadableStream webidl
  2. In the constructor, use NewReadableDefaultStreamObject, passing the arguments received on the constructor.
  3. Store the JSObject on the rust struct.
  4. Then for each method(close, cancel, tee and so on), use the corresponding SM API, like ReadableStreamCancel, passing a handle object from the JS object stored at 3.

Then I guess the Rust struct could just have another method to create a stream backed by a Rust underlying source, this time using NewReadableExternalSourceStreamObject. You'd still end-up with a Jsobject, and in this case also with a corresponding Rust object that would implement ReadableStreamUnderlyingSource(which I think would require an intermediary C++ class like is done with PromiseJobQueue).

So basically, we don't use the code in the js/src directory, we use the functions in js/public/Sream.h, from within the Rust struct.

jdm commented 4 years ago

That sounds like a reasonable plan!

gterzian commented 4 years ago

@jdm Do you have an idea how to work with JSFunction and the corresponding HandleFunction?

The SM API expects a HandleFunction for the size function that can be passed to the ReadableStream constructor.

I tried using Function as an webidl argument to the constructor, like is done with setTimeout for example, and that still seems to be using a JSObject under the hoold, not a JSFunction, so to_jsval seems to return a handle to an object, not a function, and then the SM API complains.

Would there be a way to get a *mut JSFunction passed to the constructor, instead of a Function? Or is there a to go from Function to JSFunction? Or from object to function?

See https://github.com/servo/servo/pull/25827/files#diff-4a2b21dadec30a5cccff658e252da1a5R46

By the way I am also wondering if its a good idea to get those raw *mut JSObject as arguments to the constructor? It actually seems they have to then be put into a Heap to then get a handle, since the SM API expects a HandleObject...

gterzian commented 4 years ago

Ok I think I will accept an object as argument to the constructor, and then:

  1. Make sure it's a function using JS_ObjectIsFunction(JSObject* obj)
  2. Turn it into a function using JSFunction* JS_ValueToFunction(JSContext* cx, JS::HandleValue v);

Or perhaps go from the Function, to a HandleValue, and then to a JSFunction via JS_ValueToFunction(and then use rooted! to get a handle to it?)

So the question would be using *mut JSObject as the argument to the constructor, or Rc<Function>, and then how to get end-up with a HandleFunction to pass to the SM API call.

At this point it's trial and error for me...

gterzian commented 4 years ago

Hang on I think I am starting to get it, see https://github.com/servo/servo/pull/25827/commits/a1ac06bc3cf6d01f5d85381daaf3a306b1647384

Next I'll try to remove the use of Heap on the stack, and see if I can get someting rooted as argument to the Constructor call, instead of *mut JSObject.

gterzian commented 4 years ago

Ok I think this is going actually pretty well(although I haven't run anything so it might all crash), see the WIP at https://github.com/servo/servo/pull/25827

One thing I noticed is that there are a bunch of other objects, for example ReadableStreamDefaultReader. I started making an interface for that as well, and then I realized it might not be such a good idea, and it might be better to just return a JSObject to script.

For example there is a comment that reads:

C++ equivalent of reader.cancel(reason)
(This function is meant to support using
 * C++ to read from streams. It's not meant to allow C++ code to operate on
 * readers created by scripts.) 

(https://github.com/servo/mozjs/blob/e21c05b415dfc246175ff8d5fc48b0e8c5b4e9e9/mozjs/js/public/Stream.h#L427)

and also those reader functions that SM exposes don't seem to give you everything required to implement everything that would be required if we had our own rust struct and exposed it as an interface to script.

Also, it's probably better not to call back and forth from rust to JS if it's not necessary.

So does it sound ok to just return the JS object to script, and not do everything with an webidl?

gterzian commented 4 years ago

Ok so what I have so far crashes in many different ways when trying to run the test-suite, and I'm starting to think there must a better way.

In gecko they seem to only have this: https://dxr.mozilla.org/mozilla-central/source/dom/bindings/ReadableStream.h, which is used in the codegen like https://dxr.mozilla.org/mozilla-central/source/dom/bindings/Codegen.py#1393, and seems to rely on https://dxr.mozilla.org/mozilla-central/source/dom/bindings/SpiderMonkeyInterface.h

Using js::friendapi::UnwrapReadableStream would appear key.

Here's the gecko patch to Codegen.py: https://bugzilla.mozilla.org/page.cgi?id=splinter.html&ignore=&bug=1128959&attachment=8883236

The full gecko issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1128959

jdm commented 4 years ago

I agree - the WebIDL shouldn't be necessary, because the JS object already exposes the correct prototype hierarchy that matches what the web platform requires. Translating the ReadableStream type in codegen into a JSObject should be a reasonable start.

gterzian commented 4 years ago

Ok I've looked more into codegen, and I now see a bit how I could add some code integrating readable stream(it would seem important to add some code handling ReadableStream above https://github.com/servo/servo/blob/756cf66cd2f134edf4b3a14226a3028650af9379/components/script/dom/bindings/codegen/CodegenRust.py#L921)

And there's one thing still baffling me: the constructor. How is it that currently new ReadableStream() gives an error, and how can we make that call into SM? Is that dealt with at the codegen level? That's not something I'm actually seeing anywhere...

jdm commented 4 years ago

That's from a RealmOptions setting: https://doc.servo.org/mozjs/jsapi/struct.RealmCreationOptions.html#structfield.streams_

gterzian commented 4 years ago

Awesome!

gterzian commented 4 years ago

Another question: What is the best way to implement this ReadableStreamUnderlyingSource class in Rust: https://github.com/servo/mozjs/blob/e21c05b415dfc246175ff8d5fc48b0e8c5b4e9e9/mozjs/js/public/Stream.h#L79

Can I just have a struct with the same method names, and cast it to a *ReadableStreamUnderlyingSource and pass it to SM API calls?

It's similar to what was done with the JobQueue there https://github.com/servo/rust-mozjs/blob/c1e5ad8de9f4f2c4c09eb033c40ebf1020b107fa/src/jsglue.cpp#L42

However this time I don't think I need static "traps", but rather an actual "instance" of a struct.

pshaughn commented 4 years ago

My guess here is that to make the virtual methods work you'd need to make a C++ class that derives from ReadableStreamUnderlyingSource and wrap the Rust methods in its C++ methods. Looking at JobQueue, you could probably implement it using the same traps pattern as that.

gterzian commented 4 years ago

Little update on what I'm doing over at https://github.com/servo/servo/pull/25873

So basically just trying to integrate the stream workflow with other parts of the engine, mostly fetch.

I'm starting to realize there are parts where we probably want to skip the JS api, since there seems no point of shuffling data from native to JS and then back to native, but for now mostly trying to use the JS apis everywhere. We could later have some "fast native path" to these operations. Also, currently we sometimes have a Vec<u8> as the "source", however that's mostly a result of reading some data into memory(for example when a Blob is the body of a request). So obviously we could replace the Vec<u8> with some sort of streaming source, and then it's not so weird anymore. For now keeping the vec in some places just to make integration easier.

Also, this is actually a pretty fundamental changes to how we do some of the file operations, and fetch.

Currently, the "request body" is always a Vec<u8>, which we first read into memory from some source, and then we IPC it over to Fetch. With this, the body is a stream, and it's up to Fetch to actually pull chunks from it. I haven't looked too much into responses yet, but they are sort of "streamed" already as they come in over the network.

With regards to file operations, this can also turn things on its head. So currently we're just sending a message containing an IPC sender to the file manager, and then the manager just reads a file as fast as possible and "push" it all over the IPC sender. So with a stream, we could do something were it's the stream that sends an IPC message to signal it wants another chunk, and only then would the filemanager "read a chunk" and send it over IPC.

So as a first step I think we can just integrate streams with the current setup, but its sort of "useless" since we'd just be pushing data down into the stream and then resolving a promise, which doesn't really require a stream at all. So the stream basically would just add some JS overhead to the operation. But the power of streams is that the reader is pulling chunks from it, and you get this internal backpressure, which only makes sense if script is also "pulling" chunks via the filemanager.

So anyway, the current work is mostly to get the basic integration to work, and sometimes this is actually just going to add overhead to the existing operation. But it opens the door to "doing things properly" where it would suddenly make sense...

gterzian commented 4 years ago

Ok so I think I will limit the initial PR to two things(in addition to the basic setup):

  1. Integrate stream in Request bodies.
  2. Integrate stream with reading from a (file-based) blob(where the resulting stream can then be part of a request body).

Because if I only do 1, then the stream is basically just wrapping a Vec<u8>(or an JS provided stream), so I think it's worth it to actually do the blob reading, since that gives us a proper "stream backed by a native underlying source".

So that leaves integrating into Response bodies, and probably a ton of other stuff....