psmoveservice / psmove-ue4

Plugin for using PSMove as input into Unreal Engine 4. Currently obsolete. Please use PSMoveService.
GNU General Public License v2.0
59 stars 28 forks source link

Windows 7x64 Slow Orientation Tracking #18

Open Ixstala opened 8 years ago

Ixstala commented 8 years ago

Hi, Great work getting all of this together and working rather easily on Windows. I've got your Psmove libraries working in my graphics framework and the tracker seems to be working fine (even great) and reports >400FPS tracking a single move.

The problem arises when I try to get orientation estimates using psmove_fusion_get_projection_matrix & psmove_fusion_get_modelview_matrix; the orientation updates very slowly, ~1 or more seconds of lag. This affects psmove_get_accelerometer_frame as well.

I checked the raw accelerometer, magnetometer, & Gyro data, it is streaming fast, so I'm not sure why the orientation estimate is so slow. The orientation tracking works perfectly with your as built binaries. I've run the magnetometer calibration too and both files are in my AppData directory.

Any ideas?

If I can't get it working I'll probably just do my own implementation of the sensor fusion bit, although I would like to avoid that - it's so close to working!

Thanks!

cboulay commented 8 years ago

So, to be clear, you're not using UE4, correct? The UE4 plugin and the provided binaries are all set to use lowpass filtering and not Kalman filtering. See here and here.

The Kalman filtering never worked right for me, even after doing the smoothing_calibration. I'd get very noticeable lag, but nothing on the order of 1-second. I'm guessing it was closer to 100-msec.

If you initialize the psmove tracker using the settings as in the above examples, does it improve your orientation performance?

brendanwalker commented 8 years ago

The orientation smoothing doesn't use the Kalman filtering. That's only for position smoothing. The default orientation smoothing is done in psmoveapi in psmove_orientation_update and uses a modified Madgwick filter. You could try switching to using the original madgwick filter, but it has drift issues. If you wanted to try that you can call psmove_set_orientation_fusion_type with a different filter after the call to psmove_enable_orientation in FPSMoveWorker.cpp

What does the output for you magnetometer file look like? Mine looks like this:

mx,my,mz -0.218764,0.922553,0.316330 axis,min,max x,-211.000000,160.000000 y,-280.000000,87.000000 z,-44.000000,303.000000

There was someone else who lived in the southern hemisphere (Indonesia) who was having weird controller drift issues when they laid their controller flat. I was going to look into that tonight as I suspect their might be some assumption my filtering code is making that works fine in the northern hemisphere, but breaks in the southern hemisphere (i.e. the 'my' value of the calibration direction is negative instead of positive).

Ixstala commented 8 years ago

Hi Thanks for the responses. You are correct I am not using UE4 for this project.

I went in and added the additional smoothing and tracker settings. It didn't seem to help. I just noticed that my raw accelerometer values are also quite laggy, initially I thought they were fine. For example I flip the controller end over end and the output lags by at least 0.5 second or more. I'm thinking I might have a problem with my polling frequency.

My magnetomer file looks like this:

mx,my,mz -0.33063,0.928896,-0.1634 axis,min,max x,-59,271 y,-72,250 z,-38,278

So not that different from yours.

Ixstala commented 8 years ago

I just checked and I'm polling at ~30FPS, so roughly 33ms between calls to my update function. This does seem a bit slow.

After looking at the output a bit more the orientation does track well, but it seems to be buffered so my motions occur some time after the fact.

Thoughts?

Update: I'm making headway. If I comment out the tracker_update_image and tracker_update the orientation estimate becomes speedy and stable. Maybe my camera is only operating at 30FPS? Is there some way to set that?

brendanwalker commented 8 years ago

The framerate and resolution settings are part of the PSMoveTrackerSettings structure that get's passed into psmove_tracker_new_with_settings. The default should be 640x480@60fps.

Just out of curiousity do you have the same update latency issue when you run the test_opengl.exe?

Also If you are using the libusb camera drivers they often have trouble in multi-threaded apps running faster than 30fps. The Windows implementation of libusb driver doesn't support isosynchronous transfers and instead uses an event based bulk transfer that ultimately block using WaitForSingleObject() which has a hard time responding faster than about 20-30ms. You could try compiling against the commercial CLEye drivers instead of the libusb ones. Are you building psmoveapi directly or just using the library included with psmove-ue4? And is your app a 32-bit application or a 64-bit one. I ask because the CLEye DLL only has a 32-bit dll available.

Ixstala commented 8 years ago

Happy New Year!

I seem to have got this sorted now. I had to run a separate thread just for polling the controller and then move the tracker update calls into my drawing thread.

test_opengl.exe works just fine for me, no issues. I'm using the psmove liibraries from the psmove-ue4 project and targeting a 64 bit platform.

Now my only issue is libusb flakiness. Occasionally the camera will drop out for a split second causing the controller to de-illuminate and the video feed to freeze. Depending on which USB port I'm plugged into this can get worse, fun!

brendanwalker commented 8 years ago

Yeah this libusb flakiness is not uncommon. The advice I usually give to people is as follows:

Since you're application is 64-bit that makes using the CLEye driver a bit more complicated. There is a way to spin up a host exe to load the CLEye dll and then pull over video frames using shared memory. I have a modified version of psmoveapi that supports this that I used in my psmove-unity5 project, but I have had mixed success with it. Sometimes it works great and other times it stalls out completely. See the psmove-unity5 wiki for more details on that if you're interested.

Finally I should mention that cboulay and I are actively working on windows service / os x daemon replacement for psmoveapi that manages ps move controllers, trackers, calibration, etc. Applications connect to the service over a socket to get the controller tracking data (the socket management is hidden behind a simple api). This will allow multi apps to share access to the controller and remove the need to do blink calibration every time you start your app. I mention this because it may affect how you author your psmove support into your app depending on what your time frame/use case is.

Ixstala commented 8 years ago

I'm only building 64bit because that seemed to be the current magic combination of available built libraries, working/available camera drivers, and working bluetooth pairing process (sans motioninjoy), to get something done quick and dirty.

I tried running FRAPS on the test_opengl example and I get a locked 30FPS, which seems to be what I'm seeing in my own app (I have Vsync off). Before the tracker initializes my drawing loop is >2000fps.

CL Labs now advertises that their driver is compatible with 64bit windows, although you have to pay. Have you guys tested that out?

I'd be interested in testing out the CLeye drivers with a 32 bit build if my tracking speed would increase over 30FPS, but I couldn't find the .libs for psmove to link against (I'm using visual studio). I did buy the CLeye driver some weeks back, but it seems to be even flakier than libusb - and that's just in their video test software. Maybe I have a PSeye with some loose wires, it was bought used.

If you have a repository for the windows service I'd be glad to help test it out. Sounds like a much needed improvement.

cboulay commented 8 years ago

You can still use 64-bit built utilities for bluetooth pairing, camera calibration, etc, and then use 32-bit binaries for tracking. PS3EYEDriver (and libusb) are open source so you can build them with 32- or 64-bit architectures.

CL Labs now advertises that their driver is compatible with 64bit windows, although you have to pay.

Unfortunately in this case it only means that they run on Windows 32 or 64-bit. The binaries are still 32-bit.

I'd be interested in testing out the CLeye drivers with a 32 bit build if my tracking speed would increase over 30FPS, but I couldn't find the .libs for psmove to link against (I'm using visual studio).

There's "CL Eye Driver" and "CL Eye (Multicam) SDK". CL Eye Driver doesn't come with a header or .lib file; you communicate with it through Windows APIs (DirectShow) and modify its parameters through the registry. CL Eye SDK does come with a header and .lib, but programs built with this can only be used by people that have 'activated' their cameras. I'm not sure if there's a way to 'activate' without buying credits, and that seems to be even more expensive. (Users can also use 'Multicam' programs if they purchase the SDK, choose to install the 'developer' library files, and make sure the 'distributable' dll is not on the path).

Make sure when playing around with different drivers that you properly uninstall all of the old drivers before installing a new one. I think it's possible to get into a state where you have libusb finding the CL Eye camera, and you can get video, but none of the parameter settings will work.

cboulay commented 8 years ago

I tried running FRAPS on the test_opengl example and I get a locked 30FPS, which seems to be what I'm seeing in my own app (I have Vsync off). Before the tracker initializes my drawing loop is >2000fps.

In the UE4 plugin, we're communicating with psmoveapi in a secondary thread. That'll be necessary in any VR application, until PSMoveService is ready (and maybe even then)..

But, I think there might be a problem in PS3EYEDriver/libusb. I've found that switching the framerate between 30- and 60-fps doesn't make much of a difference, and PS3EYEDriver 60 fps is obviously lower than CL Eye 60 fps. There might be an easy win here if we can find the problem.

PS3EYEDriver was originally mac only. It was me that modified it to work in Windows, but I didn't check that all of the commands worked as expected. It seemed to work, so that was enough at the time. An easy place to test would be in the sdl test app.

cboulay commented 8 years ago

In reply to my last comment: I just re-read what Brendan wrote above.

The Windows implementation of libusb driver doesn't support isosynchronous transfers and instead uses an event based bulk transfer that ultimately block using WaitForSingleObject() which has a hard time responding faster than about 20-30ms.

That would certainly explain why changing the framerate from 30 to 60 appears to do nothing. The 'easy win' is probably not so easy. Maybe libusb on Windows can be fixed? This would benefit a lot of people.

A quick search found this. The linked pull request at the beginning of that thread is 404'd but the two commits can be found here.

brendanwalker commented 8 years ago

A quick search found this. The linked pull request at the beginning of that thread is 404'd but the two commits can be found here.

That's super interesting. JoshBlake's commits adding isosynchronous transfers to libusb look pretty straight forward to incorporate. I did some searching to see who had used isosychronous transfers in libusb to provide a guide as to how to proceed. It turns out there are two interesting cases.

In both examples it just looks like you fill the transfer request a bit differently (more in flight packets) and the callback is a bit different (iterate over the iso packets). Other than that the ps3eye.cpp code wouldn't have to change too much.

The only thing that makes we worried is that even with isosynchronous transfers the same wait for event function using libusb_handle_events_timeout_completed (which calls WaitForSingleObject internally) is still used. However, after reading a bit about how isosynchronous transfers work here it sounds like they lock in a latency (as compared to bulk transfers which have no guarantee on latency). So I think this approach is certainly worth a try.

cboulay commented 8 years ago

Maybe unrelated, I was reading other libusb issues and I noticed that one libusb contributor kept saying "that'll be fixed with an upcoming event abstraction merge". Here is his fork that contains those changes. I suspect that this abstraction would make it easier to support isoc in Windows.

I also noticed that the isoc support seems to be backend-dependent. JoshBlake's commits linked above seem to only be for the libusbK. There's another issue that references a fork that implemented a libusbdk backend and that supposedly supports isoc OOTB. I couldn't find anything else about libusbdk or how to install it.

Ixstala commented 8 years ago

This sounds promising. Do you think you guys will be able to figure out how to add the ISOC support to speed up the video transfer?

cboulay commented 8 years ago

This issue is worth reading, especially the comments by Timmmm. I'm going to ping him and see if he had any success.

Ixstala commented 8 years ago

I was able to build your fork of psmoveapi with the CL eye SDK for x86 target in visual studio. I'm getting a solid 60FPS from the camera now, tracking is great - virtually no drift. Still some lag, but it's likely just a handful of frames.

For some reason CLEyeMultiCam.dll is missing from code laboratories latest SDK - maybe it's an oversight. Luckily I had it from a different API that I was exploring. There are two versions floating around, a 38k and 45k file. The 38k version worked.

Timmmm commented 8 years ago

Hey guys, I never had any success with UsbDk or libusbK. At the time I wasn't sure my device firmware was working correctly though, so maybe the host software was fine. For the device I'm using an Atmel SAM3X8E on an Arduino Due - one of the very few Cortex M chips that supports USB High Speed.

I did get both patches building though so I could send you the code if you like. One of the main things that scared me off UsbDk was that it totally broke USB when I installed it on my work laptop!

I eventually gave up on libusb entirely and just waited for Windows 10 to come out. I now have USB High Speed isochronous transfers working perfectly using the native WinUsb driver (example code here - scroll to Step 4). I'm using 1024-byte packets, with one packet per microframe so I get around 8 MB/s transfer speed. You can have up to 3 packets per microframe so you can get up to about 24 MB/s (exactly 24576000 B/s) but I haven't tested that.

I was planning to add native WinUsb ISO support to libusb after getting it to work, but I haven't got around to it, and probably won't because WinUsb has a much easier to use API (and probably someone else will do it eventually anyway).

I can send you any of the following code if you like:

  1. Working WinUsb code that sends ISO IN requests. It's very similar to Microsoft's example code.
  2. Working Arduino Due firmware that just sends dummy data to an ISO IN endpoint.
  3. Non-working (AFAIK) code that uses libusb, patched to support isochronous transfers via libusbK ("libusb-winiso").
  4. Non-working (AFAIK) code that uses libusb, patched to support isochronous transfers via UsbDk ("libusb-usbdk-backend-v3").

By the way another reason to avoid UsbDk or libusbK is that the driver signing has got more difficult in Windows 10. I'm not exactly sure of the situation - there are discussions on the libusb mailing list.

cboulay commented 8 years ago

@Ixstala The CL SDK installer puts the DLL into C:\Windows\SysWOW64 . During setup, the installer will ask you if you want to install the "distributable" or the "developer" library. The "distributable" one won't work unless your camera is "activated". The "developer" one should work without any driver installed, but you can't package a project with this library. We have the latest "distributable" dill in our (still private) PSMoveService repo and it's ~ 80k.

@Timmmm This is going to be deprioritized for us for now but it is something that we are ultimately interested in fixing. I expect it will be useful in the future, so please send me (3) either to the e-mail I contacted you from or a link to a repo. As for (4), they are now up to v5... I wonder if there have been any improvements.

I'm going to ask someone else to take a look at this so the conversation might not be dead just yet.

cboulay commented 8 years ago

@rovarma Based on your interest in psmoveapi for Windows, I thought you might be interested in this thread.

rovarma commented 8 years ago

@cboulay Thanks for bringing this to my attention; this explains some of the issues I've been seeing as well (30 vs 60 FPS). I was planning on having a look at the problem in WPA, but haven't gotten around to it yet. I didn't realize libusb was using WaitForSingleObject internally, but that may indeed explain it.

When used with a timeout, WaitForSingleObject/WaitForMultipleObjects are dependent on the system timer resolution, which by default is 15.6ms, so depending on when the event is signalled (and the specific timeout used for the wait) within a system tick, it may wait up to ~15-30ms, which seems to fit with the delays @brendanwalker is seeing.

This problem (and related problems such as Sleep(1) sleeping for more than 1ms) are usually fixed by calling timeBeginPeriod(1) somewhere during application startup, which increases the timer resolution to 1ms (don't forget to call timeEndPeriod when done!).

I can't test it right now (late), but perhaps somebody in this thread can test calling timeBeginPeriod(1) in the main() of one of the test apps and see if that improves things? If it does, libusb may be just fine and we won't need to look at replacements (perhaps timeBeginPeriod(1) it should be part of PSMoveAPI's init in that case?).

@Timmmm & @cboulay I'm not particularly familiar with the inner workings of usb, libusb or isochronous transfers, but @Timmmm mentions an upper limit of 24.576 MB/s. Is that configurable? Because if not, that's clearly insufficient for our purposes, since we need atleast 640 * 480 * 2bpp * 60 FPS = 36.864 MB/s to drive the PS3 camera at 60 FPS. Nevertheless, @Timmmm, I'd be interested in your changes for (1), incase my above timeBeginPeriod suggestion doesn't pan out.

cboulay commented 8 years ago

@rovarma Can you also send me an e-mail chadwick.boulay at gmail.com ? I'd like to ask you about something else.

Timmmm commented 8 years ago

@rovarma Quick explanation of USB 2 speeds:

USB 2 "High Speed" (480 Mb/s) transfers occur in 1 ms frames, which are divided into 8 microframes (125 us each). With an isochronous transfer you can send up to 3 packets per microframe and the packets can be up to 1024 bytes each. That gives you a maximum speed of 1000_8_3*1024 = 24 MB/s.

Bulk transfers can be faster because they can theoretically have up to 13 512-byte packets per microframe, but the bandwidth is not gauranteed (for iso transfers it is). In practice bulk transfers can get up to around 40 MB/s.

The packet size and number of packets per microframe are decided by the device, although I believe some devices provide several alternate interfaces with different sizes, so you may be able to choose. It is easy to see if this is the case by plugging in the device and using UsbView (on Windows; lsusb on Linux). Look at the wMaxPacketSize field (it also encodes the number of packets per microframe) and bInterval (if >1 then not every microframe is used).

More info here: https://msdn.microsoft.com/en-us/library/windows/hardware/ff539317%28v=vs.85%29.aspx

If the PS3 camera can actually output 640x480x2@60fps I'd guess they are using some form of compression (or a non-standard USB implementation). Modern USB webcams all use on-board H.264 encoders to allow them to output HD video over USB 2.

rovarma commented 8 years ago

@Timmmm Thanks for the information. I've been doing some reading about isochronous transfers (in addition to your information) and I think I have a better understanding now.

Looking at the configuration of the isochronous endpoint on the PS3 Eye:

If I understand the meaning of bInterval correctly, this means that the driver will poll the camera every 8 microframes for new data (ie. every 1ms), leading to a theoretical max bandwidth of 768 bytes * 8 microframes * 1000ms = 6.144 MB/s on the isochronous endpoint.

This is clearly not enough to drive the video stream at 640x480@60 FPS (by far), which leads me to suspect that the isochronous endpoint on the PS3 Eye is intended to stream audio data (it also has a microphone array built in).

Please let me know if I misunderstood something; I am not an expert on USB.

Regarding the output of the video stream: the PS3 Eye output is in YUV422 format (ie. 2 bytes per pixel), so 36.864 MB/s of bandwith is definitely needed to stream 640x480@60 FPS.

@cboulay @brendanwalker Given all of the above, I don't think usage of isochronous transfers will work for the video output and bulk transfer will need to keep being used. It's again late, so I'll try to have a look at whether using timeBeginPeriod in one of the test apps makes a difference tomorrow.

brendanwalker commented 8 years ago

@rovarma Ahh I bet you are totally right about the iso endpoint being for the mic. I just did a quick test with timeBeginPeriod in psmoveapi and ran locally with the test_tracker app. It /seems/ snappier but I don't have any hard numbers yet. I have some watchdog timers implemented in psmove-ue4 on the psmove worker thread (in particular around the camera update call). I can do a before and after test to see if update perf is helped there tonight.

cboulay commented 8 years ago

https://github.com/cboulay/psmoveapi now has its PS3EYEDriver submodule pointing to the modified version. Brendan, if you'd like to do it tonight, can you build the dll's then drop them into psmove-ue4 and create a pull request? Otherwise I should have time to do it tomorrow.

brendanwalker commented 8 years ago

Yup I was just working on that now. I'll update with a link when I get that in.

brendanwalker commented 8 years ago

I just did a test in both psmove-ue4 and with psmoveapi with timeBeginPeriod(1) added to ps3eye.cpp. Sadly I'm still getting frame update rates between 30-50fps. I added some timing code in psmove_tracker_update_image() where we read the frame from the ps3eye code just to be sure. At first I thought the issue might be that we forgot to drop the timeout value from 50ms to something lower on the call to libusb_handle_events_timeout_completed (it was 50ms) but lowering it to 5ms did nothing. So unless I'm testing this wrong, this fix doesn't appear to give the perf gains we hoped.

rovarma commented 8 years ago

Ah...that's unfortunate. I'll do some digging with WPA tonight to see if I can find something. Thanks for testing.

Ixstala commented 8 years ago

How does the CL eye driver achieve 60FPS if there isn't an ISOC endpoint for the video camera. Maybe the camera frame rate/video mode is not being set properly with libusb?

Timmmm commented 8 years ago

@rovarma Yep that sounds right to me, although if bInterval is 4, and there is 1 packet per microframe it actually means that it sends 1 packet per 8 microframes, giving a data rate of 768 000 bytes/second.

Sounds about right for audio - two channels at 48 kHz, 16 bit is 192 000 bytes/second.

@Ixstala If they use bulk endpoints you can get up to about 40 MB/s. 640 x 480 x 2 x 60 is about 36 MB/s so they could do it that way (especially if they control the hardware and can put the camera on its own bus).

rovarma commented 8 years ago

Just a small update: I haven't had much time to look at this further (busy few days), but I hope to be able to do so next week somewhere.

rovarma commented 8 years ago

@cboulay @brendanwalker It's been a while, but I've finally been able to profile and optimize the camera performance.

To test, I've written a small test program that captures 1000 frames as fast as it can:

while (num_frames_captured < num_frames_to_capture)
{
    BeginFrame("Main");
    psmove_tracker_update_image(tracker);
    EndFrame();
    ++num_frames_captured;
}

This is pretty much as fast as it could ever get, since there is no processing going on; I'm simply reading the camera image as fast as possible.

Analysis with a hardware USB sniffer and WPA revealed that the the cause of the bad/erratic performance was two-fold:

  1. The driver doesn't submit (nearly) enough USB bulk requests to saturate the bus; the bus is idle the majority of the time. Since the USB protocol is host-initiated (that is, the device will not send any data unless the host asks for it), this also means that the camera will not transfer data to the host for large amounts of time (the host does not ask it for data quick enough to maintain a stable data flow).
  2. This problem is further exacerbated by the fact that psmove_tracker_update_image roughly boils down to:

    while (!eye->isNewFrame) 
    {
       libusb_handle_events_timeout_completed(...);
    }
    
    uint8_t frame = eye->getLastFramePointer();
    yuv_to_bgr(frame);

    libusb_handle_events_timeout_completed is the function that drives the transfers; it causes new transfers to be queued and queued transfers to complete. In the above loop, psmoveapi is only spinning and calling libusb_handle_events_timeout_completed until a new frame arrives. This means, however, that after the frame arrives, no new transfers will be initiated (until the next iteration), which in turn means that the camera will not send any data, which means dropped frames.

To fix the performance, I've done the following:

On the PCs I've tested this on, this fixes all the performance issues I've been seeing. Some side-by-side comparisons:

USB throughput

Before optimization before_usb

After optimization after_usb

As you can see, before optimization, USB throughput barely gets above 20 MB/s, but is very erratic with a lot of drops to even lower (16 MB/s). After optimization, it's stable at 36 MB/s.

Note that both captures are from the same program capturing 1000 frames. In the second screenshot it finishes in about 16 seconds, which is what you would expect at 60 FPS (1000 frames * 16ms = ~16 seconds). In the first screenshot, the program takes way longer to finish (the end is out of the picture, but I believe it was around the 44 second mark).

Frame rate (WPA) analysis

Before optimization before_wpa

After optimization after_wpa

(Note that the displayed time range and scale are identical across both captures)

The diamonds at the top represent frame markers (they are the BeginFrame() calls you see in my sample code). The middle chart represents CPU usage; a solid color means the thread is currently scheduled in and running, white means the process was scheduled out (waiting for an event).

As you can see, about 30 FPs is the best we get before optimization, but there are a lot of (very large) frame drops, leading to a very erratic framerate. The thread is frequently scheduled out, because it's waiting for data to arrive.

After optimization, the picture is much better: a frame marker every ~16.66 ms for a very solid 60 FPS. The main thread is continously busy, except for some very small white ticks; that's actually where the main thread runs faster than the camera thread, so it has to wait for data to become available.

Where to go from here

I've submitted all of my changes to my own forks (https://github.com/rovarma/PS3EYEDriver/commit/20d727b56de074f9c883aa89b2368cd5c45977bc):

I would appreciate it if you guys (@cboulay @brendanwalker) could test it out. I think a test of test_tracker in my fork (with optimization) and some other fork (without optimization) will give a good enough indication of whether the performance is improved for you (or not).

Please note that even if the camera is capturing at 60 FPS, it's likely that OpenCV will not be able to display it at that rate; cvShowImage & Co seem very slow. Perhaps a better test would be to check the performance in test_opengl? I am not sure. Either way, try it out and let me know :)

Also, as you can see from the diff, I've made pretty extensive changes to ps3eye.cpp, notably the threading. While the threading code seems to work fine here, it's completely possible (probable) I've introduced some kind of deadlock/race condition somewhere, so I would hold off on using any of this in your ue4/unity projects just yet until it's further tested.

@cboulay I've tried to keep the change I've made cross-platform (there is a pthreads implementation of the threading code), but I don't have a OSX box to test on. I'd appreciate it if you could try compiling it on OSX to see if it still works/runs.

cboulay commented 8 years ago

I modified the SDL example to work with the new API. See here. It works in OS X, but I'm getting some warnings that are of some concern.

../src/ps3eye.cpp:130:8: warning: implicit conversion of NULL constant to 'sem_t'
../src/ps3eye.cpp:146:45: warning: 'sem_init' is deprecated
../src/ps3eye.cpp:147:26: warning: 'sem_destroy' is deprecated
../src/ps3eye.cpp:213:10: warning: cast to 'void *' from smaller integer type

Also, I modified the test_camera app in PSMoveService to use the new repo. It seems to have high frame rate, but there is a (constant?) delay of more than a second from the time I do something in front of the camera until it appears on the presented image. This wasn't a problem before. Any ideas?

Note that the PSMoveService implementation is using the C++ API without any extra buffers or USB contexts. See here (locked to most).

rovarma commented 8 years ago

Thanks for giving it a try. I'm not too sure about the warnings; I got the semaphore code off some random internet page somewhere. I'll have a look for a better version tomorrow.

As for the delay, I think that's probably caused by the app running too slowly (< 60 FPS); it can't keep up with the camera, so it starts lagging behind. Though a lag of >1 second sounds quite extreme; I would expect it to be around 266ms (16 frames, which is the max in the buffer).

Can you try timing the mainloop and post some results? Otherwise I can try to have a look tomorrow.

cboulay commented 8 years ago

In the PSMoveService test_camera app, the culprits are

imshow("result", frame);
wk = cv::waitKey(1);

If I comment out either line then I get a solid 60 fps (but of course no image on the screen). This is acceptable, as we will only be displaying the image in a separate debug process that is not being used to calculate position. And, in that process, we probably won't even use OpenCV to display the image. The SDL example in my previous post displayed at 60 fps.

That being said, what would be the best way to reduce the latency if our process can't keep up with the framerate but we're OK with dropping frames? Should we expose the num_frames variable instead of using the hard-coded 16? I tried using 2 and this does reduce the latency when running at 30 fps. Or should we make a new function below Dequeue() that moves the buffer index to the front? Or something else?

As to the max lag, new frames only get inserted into the buffer whenever the consumer pulls one out, so if the consumer is pulling out frames at 30 fps then new frames are put in at 30 fps, so the oldest frame is > 500 msec old. It honestly felt slower than that but I'll accept that it was 500 msec.

rovarma commented 8 years ago

Thanks, that's good to know. I suspected it might be that; as I noted in my original post, the functions used to display images in OpenCV all seem incredibly slow.

I think your suggestion of exposing the number of buffered frames makes the most sense if the user is OK with dropping frames. A function next to Dequeue that moves the tail to the head would be easy to implement, but it would be tricky to determine when to then call that function. You would need some way of detecting that the latency is "too high" and then call it.

So I'm in favor of exposing num_frames; I'll make the change shortly.

rovarma commented 8 years ago

Hmm, so I'm having a look at those sem_t deprecation warnings and what I've found is quite scary:

http://stackoverflow.com/questions/1413785/sem-init-on-os-x http://stackoverflow.com/questions/27736618/why-are-sem-init-sem-getvalue-sem-destroy-deprecated-on-mac-os-x-and-w

In particular:

Wow. < semaphore.h > declares sem_init so that it compiles properly on OS X, but it returns -1 with errno set to ENOSYS (function not implemented).

I have no way of verifying that, but it would certainly be a new and novel way to go about deprecating APIs.

Fortunately, it seems that the fixes are pretty straightforward; replace sem_init with sem_open and sem_destroy with sem_close. I tried making these changes here, but it seems that pthreads-win32 does not include all the required types for sem_init. Specifically, I'm unable to find the required mode flags.

Can you try making this change on OSX?

rovarma commented 8 years ago

I was thinking a bit more about the size of the frame queue and I've come to the conclusion that 16 is not a good default. The only reason I picked it in the first place is because the old (unoptimized) code already had a 16 frames large buffer. But I think that with the new code it does not make sense anymore:

In addition, the producer blocking when the buffer is full is also not a good idea:

So, I've made some changes (https://github.com/rovarma/PS3EYEDriver/commit/e4c2aba4ba48d27d6b75073f32b20b2ef9296a9d)

  1. The frame queue size is now exposed through PS3EYECam::init() and is 2 by default (double buffering, basically)
  2. FrameQueue::Enqueue doesn't block when the buffer is full anymore. Instead, if the buffer is full, it will overwrite the frame it previously produced with a new frame.

I think that together these changes give the best result:

This work very well for my test setups (60 FPS and < 60 FPS by adding artificial Sleep(...) calls).

Please give it a try and let me know how it works for you.

cboulay commented 8 years ago

I tried it and it worked well with both the sdl app and in PSMoveService's test_camera. To get rid of the deprecated semaphore functions, I made some changes (cboulay/PS3EYEDriver@baa5c575db22f0e5093bce83704552648dc09ee9). Note that I had to change sema to a pointer, I also had to give the semaphore a name and permissions that I simply guessed at from an example I found here.

cboulay commented 8 years ago
@rovarma can you please take a look at the [PS3EYEDriver/sdl/makefile](https://github.com/rovarma/PS3EYEDriver/blob/optimization/sdl/makefile#L15) and see if you can manage to include SDL2 in such a way that the [include statement](https://github.com/rovarma/PS3EYEDriver/blob/optimization/sdl/main.cpp#L7) is the same on both Windows and OS X? I don't have a mingw build system setup so I can't test this. If you get it to work then please update the README as well.

As the OS X instructions are to use homebrew to install sdl2, and then the makefile uses a shell command to find out the include directory for sdl2 that returns `-I/usr/local/include/SDL2`, the `SDL2` prefix in the include statement should not be there on OS X. In the mingw README I actually added steps to make the SDL2 prefix necessary. I can't remember why I did that... but without these steps it shouldn't be necessary. I just need someone to verify.

I took care of this and made a pull request to inspirit's repo.

cboulay commented 8 years ago

About that previous comment, can you also change main.cpp to use 640x480@60fps instead of its current 320x240@187fps?

Another thought: I'm guessing it's impossible to make the changes you outline without changing the PS3EYEDriver API. Is that correct? If so, then can you open an issue on the Inspirit PS3EYEDriver repository outlining your proposed changes? If getting the high throughput is worth it then they might consider changing the API in the upstream repository. That way, more people will benefit from your work and, more importantly, the burden of supporting these changes will be shared by more people.

brendanwalker commented 8 years ago

@cboulay I just tried building ps3eye_sdl last night with the latest from your rovarma-optimization branch (I used manually setup msvc project files rather than mingw). I got 640x480@60fps (which is awesome!) but am seeing a lot of flickering artifacts on my win10 home machine. I did get this flickering artifacts before but they were more infrequent. I'll upload a video tonight and update this post so you can see what I mean. Is this something you guys have seen? I suspect this is might be an issue with my local usb setup (too many devices on the same root hub). @rovarma What is the usb hardware sniffer you are using? I'd love to try and analyze what other traffic might be interfering with my camera's usb packets.

cboulay commented 8 years ago

I decided to setup a MinGW build system anyway. I was able to get the original PS3EYEDriver SDL program to compile (no rovarma optimizations). That went @ 60fps (!) but with frequent flickering. It's a bit surprising that it went at 60 fps, but this was on a high-spec desktop (i7, very high specs for 1 yr ago). I also tested on my Macbook Pro (2 yrs old), and it also went at 60 fps. I don't know whether to attribute that to better libusb handling in Mac or to the computer specs. But, what I'm getting at is that I can't reproduce < 60 fps performance on either of my computers except when I use OpenCV to display images (or maybe when I'm using more resources with psmoveapi and a game engine, but I never profiled that).

I tried to build the version in rovarma-optimization. I wanted to see if there was an improvement in the flickering. But I was getting build errors. @rovarma , have you tried a mingw32-make of the sdl example in your optimization branch?

Oh, and I answered my previous questions about the SDL2 include. Yes, if we remove the README instructions to move the include files around then both Windows and OS X can #include <SDL.h> and not <SDL2/SDL.h>. I made a pull request to inspirit's repo with these changes.

rovarma commented 8 years ago

Oof, a lot of things to respond to here! Let me know if I missed something :)


Another thought: I'm guessing it's impossible to make the changes you outline without changing the PS3EYEDriver API. Is that correct?

Yes, the API changes are unfortunately required. The reason is that in order to associate a specific libusb_context with a specific libusb_device, the libusb_context passed to libusb_get_device_list must match the libusb_context you wish to associate the device with.

The old API was using a single context to enumerate all the devices once, and then returning that as an array of PS3EYERef, which caused all created devices to share the same context. This is normally fine, until you want to use multiple cameras simultaneously, then it breaks down. Internally all libusb_contexts share locking state, send/receive descriptors etc; this is very poorly documented (like most of libusb, really quite frustrating).

If so, then can you open an issue on the Inspirit PS3EYEDriver repository outlining your proposed changes? If getting the high throughput is worth it then they might consider changing the API in the upstream repository. That way, more people will benefit from your work and, more importantly, the burden of supporting these changes will be shared by more people.

Yes, I was planning to do that anyway, but I wanted to wait until it had been further tested, before pushing this to any mainline branch.

I did get this flickering artifacts before but they were more infrequent. I'll upload a video tonight and update this post so you can see what I mean. Is this something you guys have seen? I suspect this is might be an issue with my local usb setup (too many devices on the same root hub).

It depends on what you mean by 'flickering'. Is it actual frame corruption? Or does it look like your frame is having the right/left sides flipped (basically flip around the vertical axis) every other frame? I suppose your video will illustrate it more clearly, but if it's the second case, I have in fact seen that before and that was due to a bug in the software (psmoveapi) rather than in the driver/camera/usb hardware.

As for the USB hardware sniffer, I used this thing, mostly because I could borrow one from work. But, I am not sure if it will help with multiple devices on a single controller interfering with eachother; it's an inline analyzer that you put between your device and the port it's connected to and as such will only capture data on that line.

Do note that I am far from an expert on USB performance analysis; about the only thing I got from the hardware analyzer was that it seemed like the device was not sending packets at the rate I would expect, which led me to read about how the USB protocol actually works in the first place. The optimization to increase the number of simultaneous transfer was more a reasoned guess based on the protocol documentation rather than any hard proof I got from the analyzer.

Before you go the analyzer route, here's some things you might try:

  1. You mention that you have a lot of devices attached to the same hub. Is it possible to just disconnect everything but the camera and see if it goes away?
  2. I used this USB software analyzer to generate the USB bandwith graphs in my original post. It has a free trial, but running it should at least give you a good idea of where the problem lies; if it's reporting a bandwith of ~36 MB/s then it's a fair assumption that there's actually nothing wrong with your controller (it's getting 60 FPS from the camera at that rate). It also has the USB protocol packets in there, so that may even give you an idea about other-device-interference.
  3. Are you using any USB extension cables between the camera and your PC by any chance? If so, did you try plugging it in directly to see if that fixes any issues? I did have some flakiness with extension cables a while ago, which were solved by switching to active USB extension cables.

That went @ 60fps (!) but with frequent flickering. It's a bit surprising that it went at 60 fps, but this was on a high-spec desktop (i7, very high specs for 1 yr ago). I also tested on my Macbook Pro (2 yrs old), and it also went at 60 fps. I don't know whether to attribute that to better libusb handling in Mac or to the computer specs. But, what I'm getting at is that I can't reproduce < 60 fps performance on either of my computers except when I use OpenCV to display images (or maybe when I'm using more resources with psmoveapi and a game engine, but I never profiled that).

Yes, the bad performance is far from a reproducible thing. My main desktop has zero issues with running at 60 FPS, but it's a pretty beefy machine. I suspect it's because the 'outer' loop (that calls psmove_tracker_update) is so fast that it can actually queue the 2 transfers back to back, leading to no bandwith loss, but I am not 100% sure about that. My two other test machines get nowhere near the required bandwidth, even though they're not like 10 year old PCs or whatever.

Bottomline, I don't think my optimizations are 100% required in all cases, but they do make it more reliable in the general wildly-varying-hardware-case.

have you tried a mingw32-make of the sdl example in your optimization branch?

No, I have not. I exclusively work with MSVC, but I can try setting up a MingW buildsystem. Do you happen to have some instructions somewhere that I can follow? I'm pretty stuck in my happy windows workflow :)

cboulay commented 8 years ago

No, I have not. I exclusively work with MSVC, but I can try setting up a MingW buildsystem. Do you happen to have some instructions somewhere that I can follow? I'm pretty stuck in my happy windows workflow :)

Instructions are here. But, as your Windows builds are using MS Win APIs, you'd have to change your platform checks to look for MSVC vs rest instead of WIN32 vs rest.

Alternatively, I think you can save a lot of cross-platform work by using std::mutex. Recent versions of MSVC have good C++11 support, as does Xcode, and some freely available mingw builds. Here is an example of semaphores in C++11.

rovarma commented 8 years ago

Thanks, I'll have a look at MingW tonight.

I actually originally tried using the std threading support, but it wouldn't compile under MSVC. I was getting a lot of compile errors in < ratio > about not being able to find types that should've been defined in < stdint.h > but somehow weren't.

I tried to get that to work for several hours before I gave up.

brendanwalker commented 8 years ago

@rovarma I gave device monitoring studio a try while running my local version of ps3eye_sdl a try with various usb cable configurations. I originally had my ps3 eye plugged into a USB3 hub with nothing else on it. To make sure I wasn't getting any other device interference I pulled out everything except my keyboard and mouse and plugged the ps3 eye directly into the back of my PC. I was still getting between 28 and 36MB/sec from the camera when running at 640x480x60fps:

image

I tried the other resolutions and frame rates (32x240x60, 320x240x187, and 640x480x30) to see if they resulted in the same variablity, but they were always stable (at 8, 28, and 18MB/s).

Then after removing a sleep statement in the frame polling loop I got a stable 36 MB/s:

image

So this would seem to me either that the sleep was taking longer than 10ms randomly or that falling behind on reading frames chokes up the event polling somehow?

rovarma commented 8 years ago

@brendanwalker The fork you linked to appears to contain the vanilla version of PS3EYEDriver (ie. without my changes). Is that the version you ran the usb monitor against?

If so, then what you're seeing makes complete sense: in that version the FPS you'll get is directly dependent on how frequently you call PS3EYECam::updateDevices, since that drives the libusb event loop. If you don't call it frequently enough, libusb will not queue its transfers back-to-back, causing you to lose bandwith (the camera will not be sending data).

Part of the change I made in my fork was specifically to cut this dependency; the libusb event loop is now driven from a seperate thread so that the program consuming the frames does not have to worry about these details and can simply get frames as fast (or slow) as needed.

Regarding the sleep resolution: sleep on windows is actually pretty precise, but is only as precise as the system timer resolution (the time you specify is quantized to the nearest system timer interval). You can set the system timer resolution to 1ms by using timeBeginPeriod(1) as I described earlier in this thread. After setting it, you should not be seeing the variation you were seeing before anymore.

Bruce Dawson has a pretty good post explaining the sleep behaviour over here.

Does removing the sleep also fix the flickering issue you were seeing? Or did that only show up when using my fork?

cboulay commented 8 years ago

@rovarma Did you try MSVC 2015? (It's actually supposed to be there as of MSVC 2012, though buggier in older versions). Also, what time zone are you in? I'm EST (UTC-5) and Brendan is in PST (UTC-8) but he likes to work late.

rovarma commented 8 years ago

Nope, I'm using 2013 at home. I've been considering the switch to 2015 (have been using it at work for about a year now), just haven't gotten around to it yet.

However, I believe it indeed should work with 2013, it's just some weird include ordering issue; googling around for the error will turn up some people who 'fixed' it by fudging with the include ordering, but I couldn't get it to work. I can't remember the exact error I had off the top of my head, but if you #include < thread > in ps3eye.cpp you should see it.

I'm in GTM+1 (Europe)