How to best re-create the encoder for resize-on-the-fly

Nexarian commented 2 years ago

Prompted by discussions in https://github.com/neutrinolabs/xrdp/pull/2175 and attempted to be addressed here, a question has arisen as to the best way to re-constitute sessions that utilize an encoder (RFX, RFX Progressive, H264).

@Nexarian wrote this to delete the encoder at the beginning of the resizing state machine, but this could cause issues with the state of the module manager (xrdp_mm).

This issue is to continue the discussion so that the PR https://github.com/neutrinolabs/xrdp/pull/2175 can be unblocked.

aquesnel commented 2 years ago

Hi @Nexarian,

Your write up in the wiki on EGFX architecture and Dynamic Monitors is very helpful to understand the integration of encoding and resizing.

Before we discuss the issue of stopping and recreating the encoder I still need to learn more about the encoder and how it integrates with xrdp/xorgxrdp/xorg. Can you please help me learn about the encoder interface that you've designed as well as the encoder interface provided by the OpenGL/DRI3 or NVENC/QuickSync libraries that you are using? Can you please share some code references where the xorgxrdp_helper does the encoding, and links to the library docs that are being used.

Thanks

Nexarian commented 2 years ago

First off I need to continue to re-iterate that the design and implementation of nearly all of this belongs to @jsorg71. My value add is in understanding, documenting, testing, rebasing, and debugging some pieces of it as well as adding dynamic resize and promoting this! So when you say it's "mine" it's very much not. I'm just someone who is passionate about this and trying to move it forward in whatever way I can, mostly faking it until I maybe make it?

This stuff is complex and niche, far beyond what you learn in school or at many typical developer jobs, and it's taken me a long time to learn what I've discovered! My hope is that by broadly communicating this hard-fought knowledge I'll create a space where many more can help contribute here.

Now that this is out of the way, here's some answers:

Anything that uses RFX (RemoteFX, an open but proprietary JPEG2000 based encoding created by Microsoft) or H264 will need an encoder. devel today has non-Progressive RFX encoder (librfxcodec), and the performance of this is superior to what I believe is called raster "fastpath" which is what most people are still using today. Some customers of XRDP have told me they use fastpath with 16 bit color just to save bandwidth and make it even remotely reasonable.

The encoder's job is to take the RAW RGB data from the video driver's memory and reformulate it into a more compressed format. The raw RGB data comes from a part of the xorgxrdp module that takes "snapshots" of the X session whenever a draw operation happens OR a damage signal is sent (damage meaning something changed): https://github.com/neutrinolabs/xorgxrdp/blob/devel/module/rdpCapture.c#L1085

In the EGFX branch(es), there are two encoders, both operate under similar principles. Let's take a look at the create function in XRDP: https://github.com/neutrinolabs/xrdp/blob/devel/xrdp/xrdp_encoder.c#L51

It initializes the proper codec based on the capabilities that were negotiated during initial connection setup, and then starts a thread that processes those messages externally. This is known as an unchained protocol
Now something has to feed the queue. What is that? This happens when a "paint" signal is received from xorgxrdp: https://github.com/neutrinolabs/xrdp/blob/devel/xrdp/xrdp_mm.c#L3439 and https://github.com/neutrinolabs/xorgxrdp/blob/devel/module/rdpClientCon.c#L2445. When the encoder is done, another part of the module manager takes the completed encoded binary blob and ships it to the client as a PDU when signaled (https://github.com/neutrinolabs/xrdp/blob/devel/xrdp/xrdp_mm.c#L2990)
Finally, when we're ready to delete the encoder, this is a blocking call that waits for everything to be finished: https://github.com/neutrinolabs/xrdp/blob/devel/xrdp/xrdp_encoder.c#L142

Ok, so that was XRDP. But in the EGFX branch(es) there is a new animal, the xorgxrdp_helper (Or what I like to call the acceleration_assistant)

This replaces the part of the XRDP encoder that does the heavy lifting. The XRDP encoder still exists in this mode, but it only checks to see if the encoding has happened, and then simply memcpy's the finished product to be shipped out as a PDU: https://github.com/Nexarian/xrdp/blob/mainline_merge/xrdp/xrdp_encoder.c#L682

The xorgxrdp_helper man-in-the-middles the signals that normally would go from xrdp <-> xorgxrdp, so being super clear: xrdp <-> xorgxrdp_helper <-> xorgxrdp. It's initialized when these conditions are met: https://github.com/Nexarian/xorgxrdp/blob/mainline_merge/module/rdpClientCon.c#L401, https://github.com/Nexarian/xorgxrdp/blob/mainline_merge/module/rdpClientCon.c#L1326, and https://github.com/Nexarian/xorgxrdp/blob/mainline_merge/module/rdpClientCon.c#L952

and the functions that process the signaling and intercept them and then forward them are here:

https://github.com/Nexarian/xrdp/blob/mainline_merge/xorgxrdp_helper/xorgxrdp_helper.c#L160 and https://github.com/Nexarian/xrdp/blob/mainline_merge/xorgxrdp_helper/xorgxrdp_helper.c#L282

and encoding is initialized here: https://github.com/Nexarian/xrdp/blob/mainline_merge/xorgxrdp_helper/xorgxrdp_helper.c#L37

NVENC encoding is the only one I've tested, and that's done here: https://github.com/Nexarian/xrdp/blob/mainline_merge/xorgxrdp_helper/xorgxrdp_helper_nvenc.c#L290, and gated with a configure flag at build time (not the best, but this is a prototype).

Note that during resize, I only shut down the XRDP encoder, because the goal is to prevent bad data from being packaged up into a PDU and sent to the client, because Microsoft's clients will "black screen" the session if they get even a single frame that has something wrong with it. FreeRDP is more flexible, however.

As to OpenGL/DRI3. DRI3 allows for the rendering of the X session to live in video memory for all but Nvidia (because Nvidia chose to go a different direction, which is related to why Nvidia drivers on Linux are so difficult to integrate with). However, just because the X session is rendered with the video drivers doesn't mean the ENCODING is done there, they are separate. In practice, using DRI3 but no H264 encoding doesn't net you much performance. Encoding with progressive RFX and/or H264 is what really speeds it up. OpenGL can be used to get around this for Nvidia with nvidia_hack described in the wiki. OpenGL has a way to directly access video memory, be it DRI3 OR Nvidia.

As to QuickSync, I haven't used it as I unfortunately don't (yet) have access to an Intel processor that has the QuickSync technology. All my work computers are Xeons, and all my home computers are AMD doh. What I do know is that building the SDK for that is more complex, and the instructions are here: https://github.com/neutrinolabs/xrdp/issues/1422#issuecomment-1069527886

Hopefully this helps! Please keep the questions coming!

matt335672 commented 2 years ago

Hi @Nexarian

I don't fully understand where your concerns are with xrdp_mm.c. Could you be a bit more explicit please?

pnowack commented 2 years ago

devel today has non-Progressive RFX encoder (librfxcodec), and the performance of this is superior to what I believe is called raster "fastpath" which is what most people are still using today. Some customers of XRDP have told me they use fastpath with 16 bit color just to save bandwidth and make it even remotely reasonable.

Fastpath and Slowpath are unrelated to codecs. In fact, your plain RFX encoder runs via the fastpath too. What you are referring to here are RLE encoded bitmaps.
The fastpath has reduced or removed some headers compared to the Slowpath, so it is pretty much mostly a bandwidth reduction.
The graphics pipeline btw uses the Slowpath (like all virtual channels). With that in my mind, you already know that the Slowpath is not necessarily slow.

Encoding with progressive RFX and/or H264 is what really speeds it up.

Always keep in mind: codec != implementation

Nexarian commented 2 years ago

@pnowack Thank you so much for the clarification!

Nexarian commented 2 years ago

@matt335672 The purpose of this is to respond to this comment: https://github.com/neutrinolabs/xrdp/pull/2175#discussion_r840557404

The xrdp_mm, while at best a loose confederation of states and variables, still has some level of state management. The question is: During the execution of the async resize state machine, is it OK for the encoder to be NULL? If not, is there a better way? I have tested this extensively during my prototyping with EGFX, so it "works," but is it good design?

matt335672 commented 2 years ago

I can't personally see a problem with using a NULL to indicate a missing or inactive encoder, providing that all such instances are checked for before being dereferenced. You could use an explicit variable for this, but based on what I can see so far, I can't see a point to it.

Is that helpful?

matt335672 commented 6 months ago

@Nexarian - can we close this now?

Thanks.

neutrinolabs / xrdp

How to best re-create the encoder for resize-on-the-fly #2313