rust-windowing / winit

Window handling library in pure Rust
https://docs.rs/winit/
Apache License 2.0
4.71k stars 888 forks source link

Query Z-order / window level #2602

Open cfkaran2 opened 1 year ago

cfkaran2 commented 1 year ago

Hi, I have a use case for knowing the Z ordering of all visible windows. I searched through the issues and PRs, and I think that #2534 and #1883 are almost what I'm looking for, except that they don't have a way of querying what the current Z order is. Is it possible/easy/reasonable to add a new WindowEvent variant that reports the window's current on-screen order? Something like ZOrderChanged(i64)? The value of the number is the ordering depth, with 0 being the front-most ('closest' to the camera). Negative values aren't allowed, but if there are platforms that don't use unsigned integers for this request, they still need to be able to be supported. Note that there will have to be a way of directly asking a window what its Z order is, so that users don't have to keep track of that information themselves, or force it in some manner (current_z_order(&self) -> Result<i64, NotSupportedError> would be my choice).

madsmtm commented 1 year ago

The design you've laid out sounds reasonable, and if you want to try to implement it, feel free!

Would you care to elaborate on the use-case a bit? (And if you do end up with a PR, make sure to include that in the code documentation as well ;) )

cfkaran2 commented 1 year ago

The tricky part is time; I never seem to have enough! That said, if I can find a way of convincing my boss that this is something I should be doing, then I'll give it a shot. Unfortunately, I only have access to Linux (Ubuntu) and OS X systems, so someone else would need to write everything else. Unless putting in todo!("Not yet implemented for <platform>") is acceptable?

As for what I want to use it for... I'm trying to create something similar to the magnifier in Apple's Preview App (see this image for an idea of what I'm talking about), but tuned to let you see 'slices' of higher dimensional data. As an example, imagine you have a 3D fluid simulator that can also simulate electrical and thermal flow. Instead of having several different windows open at the same time, you have one main window that shows you the object of interest, and an auxiliary window that shows the thermal flow of the window positioned directly underneath it. You drag the auxiliary window around the main window just like the magnifier, and you see the thermal flows directly underneath. But that's just the start!

Now imagine that you have several different types of auxiliary windows, one a magnifier, one a thermal view, and one an X-ray view. You layer the X-ray window over the main window, and adjust to see down to the depth of interest. Then you layer the thermal view over that to see the thermal flows within the X-ray view. Finally, you're really, really interested in one small region within that, so you layer the magnifier view over the whole lot. Now you have an intuitive understanding of where you are in space instead of having to correlate multiple windows that might be separated on your screen (or worse, separated by time so you have to remember what was in one view while looking at a different view).

The big win is that you can have different stacks of windows over different areas of the main window, so you can view different information in different regions at the same time. E.g., how increasing fuel/oxygen affects gas flow in one part of an engine, thermal characteristics in another, and estimated stresses in a third.

I hope that explanation makes sense. I managed to do a proof of concept a long time ago, but I can't remember which library I used at the time (it's been at least 5 years since I did it). That particular library's event system had a lot of lag, so I wasn't able to line up the contents of the different auxiliary windows to what was underneath as well as I would have liked. I'm hoping that winit has solved those issues so I can make this system work (the prototype was mesmerizing)

kchibisov commented 1 year ago

This won't work on Wayland, nor that I care, and will work in a weird way on x11 I guess. Am I correct that you want to rely on system blending, can't you just use one window and model your own windows inside it? Usually that's how it's done in games, etc. since you have control over the blending that way.

Regarding your example, I don't think that you need z-ordering here at all? There're plenty magnifiers out there and I haven't seen any of them using z-ordering information. What they do is screen capture and draw their window on top or something like that. In general window can't read what's behind them.

The problem with frame requests is that I'm not sure how they deal when the window is included, maybe there's an option to opt-out from capturing your window in a tree?

cfkaran2 commented 1 year ago

@kchibisov Yeah, I was hoping that there would be a way via XDG for Wayland, but it looks like its concept of layers is more like thick slabs that contain groups of windows whose ordering within the slab is unknown to the client. And I hadn't thought about what would happen if someone wants to mix different compositors together (e.g., X11 and Wayland on the same desktop). I could see a problem with 'sandwiching' different compositor windows together, where there is an unrelated window between the application's main window and one of the auxiliary windows; in that case the auxiliary window will show content related to the application's main window, even though that window isn't the one that is immediately below the auxiliary window.

I was hoping to use system blending for everything as it's a little less jarring than having a single window that I manage, especially when you have multiple monitors. You can drag the auxiliary windows off of the main window, but still know that they are available to drag back. With your suggestion, the auxiliary windows are clipped to the main window's view port.

Regarding your example, I don't think that you need z-ordering here at all? There're plenty magnifiers out there and I haven't seen any of them using z-ordering information. What they do is screen capture and draw their window on top or something like that. In general window can't read what's behind them.

The Z-ordering comes into play when you want to stack multiple different 'devices' together. E.g., put a magnifier over a thermal imager, which is looking at a portion of the main application window. In this particular case, the operations are orthogonal and transitive, but we can imagine operations that aren't transitive, so the stacking order matters.

What they do is screen capture and draw their window on top or something like that.

You mean 'making the pixels bigger'? That works, but it isn't what I'm after; I want to render new content into the auxiliary window. E.g., if I set the magnification to 10,000, screen captures won't work. However, if I'm doing a real-time render of some simulation, I might be seeing the entire ocean in the main window, and some kind of small-scale current in the auxiliary window. Or, if I'm designing a part, and want to know where I am in relation to some larger object, I may render the smaller part within the auxiliary window while still seeing the overall part.

The problem with frame requests is that I'm not sure how they deal when the window is included, maybe there's an option to opt-out from capturing your window in a tree?

I'm sorry, but I don't understand what you're saying here.

An Insanely Terrible/Genius Hack Part 1

All that said, your mentioning of screen captures made me realize that there is a very, very nasty hack that may work to distinguish the ordering of various windows regardless of compositors in use: watermarking the content of the surface. The watermarks can be on the least significant bit of each pixel, so they aren't visible to human beings. It consists of a tiling of QR codes, which encode the window ID. When you do a screen capture, those QR codes will be in the screen capture; search for them. You already know (from winit) the size and position of every window you've created, which tells you where you should expect to see the QR tiles within the screen capture. You can then work out the layering from the particular QR codes you actually see.

An Insanely Terrible/Genius Hack Part 2

Actually, I just realized that there could be another alternative, but I don't know enough about graphics programing to know if this is possible. Can a complete display frame be rendered on the GPU but then be thrown away between frames that are going to be displayed?

If so, we could make each surface have its own color in the throw away frame. This will be rendered to each individual pixel by the GPU. We capture the buffer from the GPU to the CPU, and then start a second render (the one that will actually be displayed). Meanwhile, we know where the corners of the windows that winit created are. Calculate vertical and horizontal lines that pass through those corners and extend to end of the screen. These lines partition the screen into a set of axis-aligned bounding boxes. Each box is either completely contained within the area of a single window's surface, or completely outside of every window's surface, and the union of the boxes partitions all of the windows. We only need to sample a single pixel within each box to know which window it belongs to (if any). We can use this information along with the information we already know (the size and position of each window as reported by winit) to determine the Z-ordering of the windows (we can also determine which portions of the windows are hidden, if that can be used to reduce computational complexity of rendering).

Uhhhh... I don't know enough to make either idea above work...

So, do you guys know if either of the tricks above might work?