Remote control car example with OpenCV UI

davinellulinvega commented 3 years ago

A simple script showing how to use OpenCV to remotely control Cozmo, while also streaming its camera feed in either color or gray scale. This is linked to issue #37.

davinellulinvega commented 3 years ago

I added the image resize and the sharpening for two main reasons:

Nobody likes to drive while squinting at a small patch on their screen.
The sharpening is in preparation of other up coming contributions focusing on head, cube, and pet detection/tracking. For those to be effective, the sharper the image, the easier my job will be down the line.

davinellulinvega commented 3 years ago

@gimait Thanks a lot for the feedback. I like your idea for the control scheme a lot more than what I implemented so far. I'll see what I can cook up with pynput and some OpenCV trackbars. The main reason why I stuck with OpenCV's waitKey() method is that I was not aware of the existence of pynput, pure and simple. All the other keyboard related libraries I have tried so far either were not cross-platform or required an elevation to root to work. Concerning your last comment, again you are entirely correct. Here I was being a smartass in my comment trying to save compute time while overlooking all those useless conversions. Well time to get back to work and modify all this. If you have some time to spare and an interest in the subject, might I have your opinion on issue #47 ? Again thanks a lot for being awesome and for your ideas.

gimait commented 3 years ago

Cool! Looking forward to see what you come up with. Hope it works with pynput, I have only used another package, getkey, in the past, but I don't think its multiplatform and is long ago since it stop being maintained. Looks like this one has regular updates though, so I though you might be interested.

I'll have a look at #47, we were planning on implementing object recognition features little by little, so it's great that you already implemented something for face recognition. We can talk about it over the ticket.

davinellulinvega commented 3 years ago

@gimait So yeah, I am sorry for the 1 commit for all, but after your feedback I have been forced to reorganize my code into a class, which means basically rewriting everything. I am still not satisfied with the result, since the video feed is now all laggy, but I do not know how I might fix this issue. I you or anyone else has a suggestion (keep in mind that displaying the image cannot be put into another thread). Edit: As mentioned in my commits, it turns out I just did not read OpenCV's documentation hard enough. You can in fact put the display function (cv.imshow()) inside a thread, if it is followed by a cv.waitKey(). So I have done that in an attempt to make the video feed less laggy, but to no avail. pynput seem to be hogging all the cpu time for itself whenever a key is pressed for a long time.

davinellulinvega commented 3 years ago

After some more reflection on the laggy video feed, it seems pretty clear that the GIL is to blame for that (unless I have overlooked something). This brings the question: are we tied to only using threads or can we go multi-process? The idea would be to dedicate a single and simple process to displaying the video, while leaving the rest of the code intact. Any thoughts on that?

davinellulinvega commented 3 years ago

Once again thanks a lot for your overall (and more specific) feedback on this new implementation. Regarding the video feed, as I already commented above, I will indeed put it back into the main thread since it makes more sense to have it separated from the image retrieval/processing thread. As for the code structure, as you suggested it would be better (or more reusable?) to have two classes: one in charge of the display and the other responsible for keyboard events, as well as the different taskbars. I think I will still keep the OpencvRC class as the main entry point, at least for the example so that everything is a bit tidier looking. If you have any advice or specific requirements for structuring the code I would gladly take it. Otherwise, I will simply take what I already have and split it into two classes (plus make all the corrections mentioned above).

davinellulinvega commented 3 years ago

@gimait As mentioned in my previous comment I have simply split the existing code into three classes (and fixed the issues discussed above as well). I am still open to suggestion for different structuring, but I wanted to test if putting the display process back into the main thread would fix the video lag issue. Well it did not, so that's that. More digging is required to solve this particular problem. The rest, though, works like a charm. As always any feedback is welcome. Thanks a lot in advance for your time and help.

davinellulinvega commented 3 years ago

@gimait eureka! I have found why the video was laggy. I do not really have the full explanation, but here is the main idea. Basically, the program uses pynput to listen to keyboard events. However, if any of OpenCV's windows is focused, they will also listen for keyboard events. Since we are forced to use the waitKey() function after a call to cv.imshow() it means that any keyboard event will interrupt any GUI "housekeeping" that waitKey() is doing. Therefore, the video will not be showing properly. Two solutions are available:

Tell people to remove the focus from all OpenCV windows when remote controlling Cozmo.
Have pynput suppress key events. This means that only pynput will see the key presses and releases, but not propagate to other programs.

To be honest I do not like solution 1, but, and that is where things get ironical, it just so happen that under Linux (more specifically within a Xorg environment) passing suppress=True to the keyboard listener randomly crashes X and logs people out of their sessions (this is a known issue). So we are in a bit of a pickle so to speak. The third option would be to look for another keyboard library, but to be honest I do like pynput and have not found anything else that could replace it. Any suggestions as to how we might proceed from here?

gimait commented 3 years ago

Oh, I see. Of course, these two are fighting to get the keys. Have you thought of using a different method to display the video? From a quick search, I found you can probably do it with matplotlib or tkinter, but there might be other options. I looked into whether you could do it directly with PIL, but I don't think it's possible, unfortunately.

I think any keyboard library will interfere with waitkey. It is quite a bad design decision from opencv to have only that function as an event manager.

davinellulinvega commented 3 years ago

@gimait Well OpenCV's documentation strikes again. To avoid any problems and the "requirement" of calling waitKey() after each imshow() the solution is to put the window in its own thread. For more information refer to the "extensive" documentation. All joking aside, it now all works beautifully.

davinellulinvega commented 3 years ago

That's the irony of this situation. Everywhere in the documentation you are being told that imshow() has to be followed by either waitKey() or pollKey() (and I have tried, it is indeed required). However, if you manage to stumble upon this one line of documentation, with no explanation to what the function actually does, it turns out that calling cv.startWindowThread() before creating a named window to display the video feed solves all our problems.

gimait commented 3 years ago

I think it's good now, let's see if we can have it merged soon.

@zayfod, when you have time, could you have a look and see what you think about the example?

Stilmant commented 1 year ago

Hello all. Starting to work on pycozmo. Thanks for your works.

I wanted to play a bit with your script but I had nochance nor under windows nor under linux (cozmo) michael@Lux-mstilma-lt:~/cozmo/pycozmo/examples$ python3 opencv_rc.py NvStorageOpResult(tag=NvEntryTag.NVEntry_CameraCalib, length=0, op=NvOperation.NVOP_READ, result=NvResult.NV_OKAY, data=b'6\xcf\x92C\x88\xed\x92Ca+\x1fC\xa7"\xdcB\x00\x00\x00\x00\xf0\x00@\x01\xd7y\x14\xbce\x97]>Q+\t\xbbub\xb7\xbaK&\xe1\xbd\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00') NvStorageOpResult(tag=NvEntryTag.NVEntry_SavedCubeIDs, length=0, op=NvOperation.NVOP_READ, result=NvResult.NV_OKAY, data=b'OMZC\x00\x00\x00\x00\x0c\x00\x00\x00\x00\x00\xe5>d\xce\x99aG3\x9e\xb8\xe4\xce\xc4\xb2') [1637469796, 3097375559, 2999242468] 0x6199ce64 0xb89e3347 0xb2c4cee4 WARNING:root:Did not get any image from the camera. So not displaying anything new. WARNING:root:Did not get any image from the camera. So not displaying anything new. WARNING:root:Did not get any image from the camera. So not displaying anything new.

Just hoping this ring some bells.

davinellulinvega commented 1 year ago

Have you tried to get an image from Cozmo's camera by other means? The only reason for this script to display this warning is if the queue between the main process and the display thread is empty for more than 0.2 seconds. You could try to either increase the timeout, or remove it completely making the call to _img_queue.get() blocking. Unfortunately, I do not have a Cozmo on hand anymore. Therefore, I will not be able to test if the code is still functioning. For information, though, the script was developed and tested on Linux. I have never tried it on other OSs. Finally, I have opened the issue section on my fork of this repository. It might be a good idea to move this discussion over there.

Stilmant commented 1 year ago

Thanks for your answer and issues section opening. I got the image from Cozmo's camera indeed. (a still photo however) I will play more around with the original stack features and see if I missed something.

zayfod / pycozmo

Remote control car example with OpenCV UI #46