microsoft / RoomAliveToolkit

Other
714 stars 191 forks source link

Unity and JPEG Compression #82

Open NPatch opened 6 years ago

NPatch commented 6 years ago

The tutorial notes that JPEG compression should be enabled. I understand that it's better for KinectServers exchanging data since it's a compressed package of data. But Unity doesn't play as well with it. Even though I've known this from experience, let me offer some numbers. I see a 3-4x speedup in framerate when I turn it off. Every frame that the server actually gives new data, Tex2D.LoadImage takes up to 40ms which acts like a spike every so often. When I turn JPEG compression off, the frame time goes down to 11-12ms avg with some spikes at 20ms. At this point I'm trying to figure out how to do a 1x1 setup but eventually it'll have to scale up to 3x3. Will this be an issue even if switches are used(eliminating wifi connectivity issues)?

NPatch commented 6 years ago

Any ideas on this one?!

NPatch commented 6 years ago

By the way, if you disable JPEG compression, Unity does not know how to handle the input and renders a big white texture with a red question mark. I tried adding some code in RATKinectClient to use LoadTextureRawData when uncompressed Color is streamed but the type of the image data is 0. So it's not easy to understand whether it's BGRA,RGBA or whatever else Kinect can give. Also tried toggling Stream RAW Color and Process Color Frames. In both cases, the texture viz is wrong. In one case you get the question mark and in the other you get a big gray image(possible alpha).

NPatch commented 6 years ago

My bad...I forgot to call Apply to upload the loaded buffer to the gpu,after LoadRawTextureData. LoadImage does that automatically. Now I have color info, but not correct.

UPDATE: To be more precise it's like the X,Y Image Direction Flags are messed up, or the sync between the color and IR-based frames is off, or both.

UPDATE 2: Since Unity has an overall speedup with uncompressed, I can't say any desynchronization is due to the uncompressed frame over the sockets.

thundercarrot commented 6 years ago

A single uncompressed 32 bit color image requires almost 2Gbps bandwidth. The synchronization issue is probably due to the network falling behind and caching the data (as TCP will do). I don't think the Unity version of the Kinect server will drop frames; I think it just opens a socket and starts writing frames.

1920w 1080h 32bits * 30Hz / 1024 / 1024 / 1024 1.85394287109375

NPatch commented 6 years ago

I was talking about the Unity version of KinectServer though. If JPEG compression is turned on, it might be faster to transmit but it's a lot slower to load into a Texture in the Unity runtime. On the other hand, if you turn the compression off, the texture in Unity gets corrupted because the code was written to only work with JPEG compression on.

thundercarrot commented 6 years ago

I'm talking about the Unity version of KinectServer as well. You might need a faster computer? Also, do you really need the color information?

NPatch commented 6 years ago

Yes we need the color info. And we want to be able to support mirror like scenarios if possible. Even if we only stream textured body index from the Runtime servers.

I have tried a faster computer and it only improved the speed of getting the data from the service. But the profiling issue is not on the grabbing of frames but the uncompression of JPEG which is inherent to Unity's JPEG loader. I switched out the laptop with a VR ready desktop that has an i7,16GB RAM and an NVIDIA GTX 680 and even though it's discontinued, it has good specs to support such heavy operations. Even that gpu gave 13-22FPS with a 2x2 scenario with JPEG compression on. I changed the code a bit to handle uncompressed data and correctly and the application's FPS rose to 70, since there's no formatted loader/decoder/uncompresser in between. Which leads me to believe the loader is probably not as optimized and also not hardware accelerated to provide good perf on a per-frame basis. This also stems from the fact that Unity's textures, when not marked unreadable, have a copy on the CPU side which has to always be in sync. So I imagine they uncompress on CPU, write to the CPU side of the texture and then just commit to the gpu the final version.

The pro in one case is that it's a fast loading that doesn't hinder the runtime too much but can easily end up with a delayed color frame and you see the mesh moving forward but the texture is not synced and you see the color line up some time later. On the other hand, if you use JPEG compression, the frequency of the color frames is higher and it's definitely synced with the depth mesh at all times, but the whole application gets so much slower, and just from grabbing the data. If you wish to do more, it gets even slower.

Of course, we are still evaluating this, so if we'll either reject some scenarios or continue with lower expectations, depending on the case.

But since this is a loader perf issue which is a Unity issue basically, it's hardly something you can fix. So we can consider this issue closed(although further suggestions/observations/discussions area always appreciated).

NOTE: RATKinectClient does not handle loading uncompressed rgb color at all. LoadImage does not work with uncompressed data, which is why I saw that white texture with the red question mark. Can be fixed with an if else block that checks the length of the byte array. If the length is equal to 1920x1080x4bytes(ARGB32) then it's uncompressed data so we call tex.LoadRawTextureData and Apply(this commits the data to the gpu from the cpu side), otherwise the length is way lower, like 1/4th of the uncompressed size and we use LoadImage(JPEG loader and it calls Apply automatically). Apart from that, the compression flag must be accessible to RATDepthMesh for setting up the RGBImageDirectionFlags per axis properly(for some reason Y is flipped on JPEG). It might make sense to avoid compression if dealing with input from the same machine. Bandwidth should not be an issue. If this is true, there is merit to fixing this and allowing support for both compression options in the Runtime Kinect Server.

NPatch commented 6 years ago

Btw, this might not be the appropriate thread, but I have a question. How does the calibration handle Kinects that are not covering projections if at all? So for mirror like scenarios, there would be a need for an extra sensor that is above the projection looking down. But that sensor, if in a cave envirnoment will either capture parts of graycodes from the side projections, otherwise nothing at all. Does it get ignored if it captures none of the projections? If it can take part(for some reason I can't think of currently), can it be the primary coordinate system where all other sensors get converted to? Or would we have to provide pose matrices etc, by hand to support at the very least the conversion to the primary coordinate system provided by one of the valid Kinects in the setup?

thundercarrot commented 6 years ago

The first camera listed in the calibration file establishes the coordinate system. All other camera and projector poses are in that first cameras coordinate system. If you were to successfully calibrated, the pose for the camera would be the identity matrix.

If you have another camera that doesn't participate in the calibration, I'm not sure how RAT behaves; I would not be suprised if it barfs. In which case it is best to treat it separately and find some other means of calibrating it. E.g., this setup:

https://www.youtube.com/watch?v=9A18AxfC2tM&t=0s&list=PLKcZe1FzCK3i-TjlWzP9TqBD6rGo1FWi6&index=14

I added a fourth camera for body tracking and established its pose wrt to camera 0 by hand.

I gotta admit I'm a little flat footed on the Unity JPEG decompression issue, as I have been lately working on a new release of RoomAlive Toolkit that pushes most of the heavy lifting into a native plugin for Unity. This approach allows for much more modularity and re-use when moving from one game engine or rendering framework to another. This currently supports 8 Kinect cameras at video rate. For JPEG decompression it uses the Windows built-in WIC codec which is quite fast. It also uses an entirely new REST server that allows for frames to be dropped.

I'm puzzled as to why uncompressed color would work at all given the bandwidth calculations I included earlier.

NPatch commented 6 years ago

Ok, thanks for answering. I was already aware of that video and I think you mentioned it in a Channel 9 post that the face tracking Kinect is separately treated but that post/video was from 2015 and I was wondering whether that had changed at all. I'm still hoping that if I place a Kinect above the Center wall projection looking down at the user, it might work if it can at least see some of the gray codes from the side walls.

Yeah, this issue has been a bane for me. Same goes for the PNG loading. For the record,I'm using Unity 5.6.6, so I'm not aware of any speedups in 20XX versions. As for the native plugin, I assume you mean a rendering plugin that receives data directly from the sensors in C++ land and updates DX11 texture resources and returning those DX11 texture resource pointers back to Unity so the C# Texture2D is basically just a wrapper around the DX11 pointer, you're managing yourself. I've written one before as a perf experiment to compare with the classic Kinect v2 SDK unity plugin and it did quite well. Also the ability to do optimizations on data layouts of other data(like joints) was very welcome. I'm sure it's nowhere near what you're making though. Will it be available publicly? Or is it a pet project?

I'm not sure what decoders Unity uses, but decoding expands the data to the same size as the uncompressed before commiting the data to the gpu and its cpu counterpart. So higher loading times for compressed data should make sense since it's extra work. Right?

thundercarrot commented 6 years ago

Yes, I plan to release it. I can't make any promises as to when though.

Yes, it makes sense that you would have higher load times for compressed data. You can trade that off the delay in network transfer for when you do not compress. I speak to this a bit in a paper I published last year on depth image compression: https://dl.acm.org/citation.cfm?id=3134144

NPatch commented 6 years ago

Yes I've seen the video before in the MS Research channel(I think). Question, so far I've come to understand that the way MS SDK works is , it has this service in the background which gets the raw input from the sensor, sends it to the gpu for processing and expansion(IR->Depth,BodyIndex,Body) and then back to cpu land in order to distribute to any interested listeners. Basically there's no half measures, the full work is done completely every frame and then applications can ask for the subset of data they are interested in. Can the NUC support all that workload from the SDK, plus the compression so it comes off as real time data? Obviously it can,since you've used it and with 8 Kinects v2s no less in a paper with good results, but I don't see how. I've had laptops underperform so far and the specs were supposedly up to par. Is there something I'm missing?!

thundercarrot commented 6 years ago

In our experience the NUCs work very well here. Re: other laptops etc. keep in mind that the Kinect can be very picky about what USB3 chipset is used. This is documented elsewhere. The NUCs from Intel use the Intel chipset (unsurprisingly) which is one of the recommended makes.

NPatch commented 6 years ago

I know, as I said in a previous response, the other day I changed from an Alienware 14 laptop because all sensors I used, in all of its 3 USB3.0 ports, had a ~1000ms response time in the calibration version of the KinectServer and as soon as I switched to the desktop, I got ~100ms. Weird part is my Lenovo Y50-70 laptop which was the machine hosting the second Kinect in the setup, which sent packets over a network switch, had ~250ms response time and was way faster in delivering the depth images than the Kinect physically connected to the machine running the calibration. On paper, both laptops should have the same Intel chipset, but when I ran the Intel Chipset Identification Utility, Alienware came up as Mobile Express while my laptop came up as Intel 8 series( on lenovo spec it was HM87 and I think the same one is mentioned in the Alienware 14 spec sheet). Obviously one of the two is different somehow, but I haven't figured out why yet.

I was also referring to NUC's gpu and not just the USB3/PCI part of the process. Could you share the NUC models? If you aren't doing anything custom to it, it would be incredibly helpful to know of a small machine that can satisfy needs, for NxN scenarios. Presumably it would minimize costs for all machines other than the primary one which has lots of requirements(calibration hub, Kinect, all projectors etc) and has to be heavy duty.

As for the documentation, it also says Renesas but I've come across posts that say Renesas didn't work. Compatibility is an issue that's still coming up in new MSDN posts. I'm guessing it's also an arch issue, otherwise we'd have more concrete info on how to choose external PCIs by now, other than it has to be Gen 2.0+.

UPDATE: I forgot to mention that one of my doubts about the NUCs was that the SDK seems to get heavier the more people exist in the sensor's fov. I've read in the SDK forum that the sensor continues to track people using 3 points so it has candidates to switch to when one of the six fully tracked bodies exits the fov. Our applications tend to be intended for social events and kiosks. So far we've used an i7 with a gtx 1050 to properly mitigate that(though I'm not sure if there were issues with the PCI/USB3.0 speed, it's an area I just started having a clearer picture of).

tkbala commented 5 years ago

I realize that this is an old thread. But it turns out I am facing the same issue. Any thoughts now on how one could speed up the "Tex2D.LoadImage" for loading the RGB frame data?

NPatch commented 5 years ago

When I tested things, I was using Unity 5.6 if I'm not mistaken. I don't know if 20XX versions have touched up on the jpeg de/encoder. That's the only thing I can think of at this point.

tkbala commented 5 years ago

I am using the 2018.3 version and running into the same issue. I was looking for ways to run the jpeg decoder on a parallel thread but I haven't had any success doing that yet

NPatch commented 5 years ago

AFAIK, the regular API in Unity is not thread safe so you can't call it outside the main thread. I guess the only way to do it would be in a native render plugin where you create your own D3DTexture2D, use your own decompressor/decoder and give Unity a pointer to it to use as a Texture2D object. If Unity 2019 hasn't introduced a change that allows you to do it in C#, going C++ is the only way. Can't help with that though.

nsmith1024 commented 5 years ago

Has anybody solved this issue? Im having really bad performance and stuttering when loading images. I think a solution is to let the browser load it for you, similar to how the WebGLMovieTexture works which uses the browser to play videos on surfaces within the unity scene. WebGLMovieTexture is located for free on the asset store, you just pass it the URL of the movie you want played and the surface you want to play it on, and it plays it asynchronously without any stuttering.

Maybe one way to speed up LoadImages is to do the same thing somehow using images rather than a video, pass the URL of the image to the browser via javascript, then let it draw the image on to the given unity object asynchronously.