replayio / gecko-dev

Record Replay gecko based browser source
https://replay.io
Other
40 stars 15 forks source link

Support WebGL #58

Open bhackett1024 opened 4 years ago

bhackett1024 commented 4 years ago

WebGL cannot currently be used when recording/replaying.

jasonLaster commented 2 years ago

We can likely use swiftshader to emulate openGL APIs when replaying, and use the hardware graphics when recording.

Chromium uses swiftshader for software graphics and recording/replaying it shouldn't be hard, but it's hard to predict what the interface between the two will look like and how high/low level it is compared to the graphics library interface

bhackett1024 commented 2 years ago

I looked through the SwiftShader docs a bit (https://swiftshader.googlesource.com/SwiftShader/+/HEAD/docs/Index.md), and I think it's a great fit for what we need. It's already designed to act as a shared library implementing the OpenGL ES spec, which is what WebGL is based on.

I was curious what the first obstacle here was so I removed the code for disabling webgl when recording/replaying (grep for ReportUnsupportedFeature) and tried recording one of the MDN demos (https://mdn.github.io/webgl-examples/tutorial/sample6/). Recording worked fine, but replaying crashed when we encountered some textures we didn't know about.

This crash is related to how we handle graphics when replaying.

In a normal firefox content process or in a recording process, graphics which the tab wants to render are rendered to textures living in that process. The textures use memory shared with the UI process, and IPDL messages are sent to the UI process describing the textures and the layer tree which they are part of. The compositor in the UI process uses those messages to manage a parent side layer tree and to do the actual drawing to the screen (note that there are more complex variations of this where a third GPU process does the drawing, but that can be disabled).

In a replaying process, we need to draw the graphics so we can show them when viewing the recording, but the UI process compositor no longer exists --- we only recorded what happened in the content process. So, we create a compositor within the replaying process, and whenever the layer tree is updated we send those updates to the in process compositor. Whenever we want the current screen contents, we ask that compositor to draw to a buffer, convert it to a JPEG, encode it as base64, and sent it to the client. We don't need to do this while recording (and can avoid the associated overhead) because the compositor doesn't interact with other parts of the browser and we only need effective determinism while replaying instead of complete determinism (see https://medium.com/replay-io/effective-determinism-54cc91f5693c).

All of this logic is managed from https://github.com/RecordReplay/gecko-dev/blob/webreplay-release/toolkit/recordreplay/Graphics.cpp

It looks like WebGL creates kinds of textures that this integration doesn't deal with yet. Adding support for the additional kinds of textures shouldn't be too hard. The main concern is making sure that graphics library calls are happening within the recording process, instead of happening within the UI or GPU process. I'm pretty sure this is the case (in the past we've had issues with recording processes trying to call OpenGL APIs) but would need to confirm. If not, we might need to do some more reorganization similar to the above to make sure the needed components are created / running within the replaying process.

Once we're able to replay recordings which use these new kinds of textures and ensure that the OpenGL calls are happening from the replaying process, we should be able to load the SwiftShader library into the replaying process and link the calls from the replaying process to the implementations in the library. Running the SwiftShader library in the replaying process shouldn't be difficult because it doesn't interact with the system, though if it has an internal thread pool (hard to tell from the docs) we might need to add support for that.

kannanvijayan commented 2 years ago

When you say that the main process "describes" the textures to the UI process, what exactly does that entail? Is it the addresses/sizes of textures generated in the main process, along with the layout info for how to composite them? Or is there some IPDL protocol for asking the UI process to allocate a new known texture which gets written into, and then after it's filled just the layout info is sent over?

bhackett1024 commented 2 years ago

When you say that the main process "describes" the textures to the UI process, what exactly does that entail? Is it the addresses/sizes of textures generated in the main process, along with the layout info for how to composite them? Or is there some IPDL protocol for asking the UI process to allocate a new known texture which gets written into, and then after it's filled just the layout info is sent over?

This is handled through an IPDL protocol. PCompositorBridge is the top level protocol, but most of the work happens within the PLayerTransaction which it manages. PLayerTransaction::Update includes the main changes to the layer tree which the compositor needs, but I don't have a good understanding of the details around the order in which things happen.

When replaying we create the in-process CompositorBridgeParent and LayerTransactionParent here: https://github.com/RecordReplay/gecko-dev/blob/0ac92fff4262ea9a8d60e79958984aa0fcf6f5b4/toolkit/recordreplay/Graphics.cpp#L62

Changes which the replaying process makes to the layer tree are sent to both compositors, through calls like this one: https://github.com/RecordReplay/gecko-dev/blob/0ac92fff4262ea9a8d60e79958984aa0fcf6f5b4/gfx/layers/ipc/ShadowLayers.cpp#L729

The crash I saw was at this point: https://github.com/RecordReplay/gecko-dev/blob/0ac92fff4262ea9a8d60e79958984aa0fcf6f5b4/toolkit/recordreplay/Graphics.cpp#L234. There is an API recordreplay::RegisterTextureChild which the layers code is supposed to use to tell the record/replay graphics code about textures it has created, so that parent-side stuff in the replaying process can create a parent-side texture for textures it sees in IPDL messages via this recordreplay::CreateTextureHost API. When CreateTextureHost sees a texture it doesn't know about, it crashes.

There may be a cleaner way of handling this with fewer record/replay specific APIs for the graphics code to call. The simpler next step though is to figure out where we're missing a call to recordreplay::RegisterTextureChild.

jasonLaster commented 2 years ago

High level

  1. Can you describe the different processes involved while recording vs replaying: content/parent/main/ui/graphics? A diagram for recording/replaying would help here.
  2. How similar is the replaying process to the recording content process?

Next steps

  1. How would we get the recording process to make the OpenGL API calls?
  2. How would we create the missing textures? Is it similar to how we currently listen for layer tree updates?

Once we're able to replay recordings which use these new kinds of textures and ensure that the OpenGL calls are happening from the replaying process, we should be able to load the SwiftShader library into the replaying process and link the calls from the replaying process to the implementations in the library

bhackett1024 commented 2 years ago

High level

1. Can you describe the different processes involved while recording vs replaying: content/parent/main/ui/graphics? A diagram for recording/replaying would help here.

This is a good technical overview of multiprocess firefox: https://billmccloskey.wordpress.com/2013/12/05/multiprocess-firefox/ See the "Drawing" section in particular.

The simplest setup for e10s firefox has two kinds of processes: the UI process is the privileged one which manages the UI and only loads trusted content, while content processes load untrusted web content and communicate with the UI process over IPC.

2. How similar is the replaying process to the recording content process?

They are the same thing, just linked differently. See https://medium.com/replay-io/recording-and-replaying-d6102afee273

Next steps

3. How would we get the recording process to make the OpenGL API calls?

I don't know. Someone needs to understand the graphics stack better than I currently do (which could be me, but we need to decide on when the appropriate time for that is).

Looking more closely at this, it looks to me like the recording process is probably not making OpenGL calls --- we disallow loading unknown dynamic libraries through dlopen while recording, which would I think be necessary to call into OpenGL (but maybe not). So, the biggest blocker here really is to get a solid understanding of what the graphics stack looks like when using WebGL canvases in firefox (using the older pre-webrender stuff, as we disable webrender).

4. How would we create the missing textures? Is it similar to how we currently listen for layer tree updates?

I think there are different kinds of textures used for WebGL canvases which are created by the recording content process and aren't being registered. Handling that may just require more calls to recordreplay::RegisterTextureChild, and changes to RegisterTextureChild and associated code to handle any differences vs. the existing textures we deal with.

* How would we know that we are ready to start working with SwiftShader? Is it that we can replay without crashing?

The best way of approaching this IMO is to configure the recording process to load a compositor and separately draw its frames to JPEGs and put them into a directory. We have logic for this already, see MaybeCreateCurrentPaintFile in Graphics.cpp (this was added to investigate https://github.com/RecordReplay/gecko-dev/issues/292). If we can turn that on while recording, load a page that uses WebGL, and get it to correctly draw its frames to JPEGs in that directory, then we'll be ready to do that while replaying.

Figuring out how this stuff works is much easier to do while recording than while replaying.

* At a high-level what does linking the calls to the implementation look like? How does this compare to what we do now with canvas or similar in-cpu compositors (Skai)?

The main difference vs Skia (note the spelling) is that Skia is compiled into gecko and chromium and participates in the recording. If we link to swiftshader while replaying then we'll be running code that didn't originally run while recording, so it can't interact with the recording. That shouldn't be a problem unless swiftshader has complex system interactions, like creating its own thread pool.

kannanvijayan commented 2 years ago

The main difference vs Skia (note the spelling) is that Skia is compiled into gecko and chromium and participates in the recording. If we link to swiftshader while replaying then we'll be running code that didn't originally run while recording, so it can't interact with the recording. That shouldn't be a problem unless swiftshader has complex system interactions, like creating its own thread pool.

Let me see if I understand this correctly.

If we want to run SwiftShader as an alternate GL backend during backend, it's fine as long as:

  1. SwiftShader doesn't touch system APIs that would otherwise interact with the recording/replaying behaviour (i.e. consume data from the recording data stream)
    • If it does touch system APIs, we can surround the calls into that code with some flag that indicates that any calls done during that duration should not trigger reading from the recording stream.
  2. If SwiftShader does do complex things like start threads, then can we modify it to mark those threads as special and to be ignored for our purposes, keeping a replay-specific table of such threads to ignore?
bhackett1024 commented 2 years ago

The main difference vs Skia (note the spelling) is that Skia is compiled into gecko and chromium and participates in the recording. If we link to swiftshader while replaying then we'll be running code that didn't originally run while recording, so it can't interact with the recording. That shouldn't be a problem unless swiftshader has complex system interactions, like creating its own thread pool.

Let me see if I understand this correctly.

If we want to run SwiftShader as an alternate GL backend during backend, it's fine as long as:

1. SwiftShader doesn't touch system APIs that would otherwise interact with the recording/replaying behaviour (i.e. consume data from the recording data stream)

* If it _does_ touch system APIs, we can surround the calls into that code with some flag that indicates that any calls done during that duration should not trigger reading from the recording stream.

2. If SwiftShader _does_ do complex things like start threads, then can we modify it to mark those threads as special and to be ignored for our purposes, keeping a replay-specific table of such threads to ignore?

Yeah, this is all right.

kannanvijayan commented 2 years ago

I wanted to add some updates after our talk with the Chrome/SwiftShader WebGL+WebGPU folks, and some twitter conversations I noticed us getting into with WebGL devs.

Overall, Ken @ Google thought that this was basically impossible and kept coming around to the idea of actually recording/replaying the GL commands being issued at some level. I don't think this is feasible for us.

Correct me if I'm wrong, but the way I envision this feature is that we would use the rest of the system (effective determinism, etc.) to ensure that we issue the same high-level WebGL calls during replay that we issue during recording, and switch out the backend during replay to use a software implementation to do the actual rendering.

This would preserve our ability to do recording with a hardware WebGL implementation.

Anyway, I saw the following tweet a few days ago in a convo with the replay.io twitter account: https://twitter.com/modeless/status/1471882912991956994?s=20

I don't understand where this comment is coming from at all. Changing the implementation of WebGL in the backend to software should not really affect the behaviour of the API calls at the high-level javascript layer, right?

What am I missing?

gideonred commented 2 years ago

What am I missing?

My understanding of the tweet: Replay.io will be using SwiftShader which may be "100 times slower" than hardware accelerated WebGL. This will make the recording experience terrible. Also, it may cause code that is timing sensitive to behave very differently (badly written code may behave very differently when everything is running 10x times slower).

My take on it:

gideonred commented 2 years ago

You can force the use of swiftshader by running chrome with --disable-gpu or by going into your chrome settings and disabling hardware acceleration.

I quickly tried it on my M1 Pro.

Here's a video of a webgl demo that's usually butter smooth 60fps running in swiftshader at about 10fps.

https://user-images.githubusercontent.com/28845582/147588321-06658846-da2a-4200-8739-b4540a520b00.mov

I also tried two video games: princejs.com (Prince of Persia) and krunker.io (Quake 3 -ish). krunker was almost playable.

ceceliacreates commented 2 years ago

Adding a +1 here, received an email from a user that ran into this issue.

gideonred commented 2 years ago

This ticket has been moved to RUN-181

sedghi commented 1 year ago

Do I need to create a Linear account to follow this issue?

jasonLaster commented 1 year ago

Hi @sedghi, nope we'll post updates to this issue.

davidshq commented 1 year ago

Any updates on this? I'm running into this trying to record a session on a web app that uses VTK.js

kannanvijayan commented 1 year ago

Any updates on this? I'm running into this trying to record a session on a web app that uses VTK.js

This is on backburner at the moment as we stabilize chromium in the near term.