Recording overlay images

geurtlagemaat commented 9 years ago

Hi,

I’m using picamera a lot and currently I’m using it on a motorcycle to record a race. Using Mapnik Python dynamically draws a racetrack with a symbol to indicate the current position according received GPS data. Using picamera camera.add_overlay() this image is displayed over the preview (as described on http://picamera.readthedocs.org/en/release-1.9/recipes1.html#overlaying-images-on-the-preview).

As mentioned in http://picamera.readthedocs.org/en/release-1.9/recipes1.html#overlaying-images-on-the-preview overlay images are not recorded only text is recorded.

Question: is there a way to have the image overlay recorded with some sort of extension? Could some one point me to the right direction, give me an idea how to implement such a function? Is this function feasible and planned in the future?

Regards, Geurt Lagemaat

jtkeyva commented 9 years ago

+1

6by9 commented 9 years ago

Not easily. The overlays are done as the frame is sent to the display. To encode it you'd have to send it to memory instead and record that. There are dispmanx calls that would allow you to do that, but it'll probably be outside the scope of picamera.

waveform80 commented 9 years ago

Hi Geurt,

That sounds like an excellent project! However, just to echo what 6by9's said: this isn't easy. At the moment, when you're using picamera with a preview, an overlay, and a video recording, the underlying MMAL graph of components looks roughly like this:

pipeline

As you've probably guessed, you can't take the output from those renderers and pass it to the encoder (at least not that I'm aware of!). Now having said that, I do have some plans in the pipeline to permit arbitrary images to be passed to an MMAL video encoder (this is largely to support projects like stop motion animation and timelapse video generation which at the moment have to rely on ffmpeg to produce their videos; this is horrendously slow on the Pi as it's all CPU based). I haven't created a ticket to cover these plans yet as it's still not clear to me exactly what this is going to entail doing to picamera's guts (which in turn will dictate whether this is something that can be done in 1.x, or whether it'll have to wait for 2.x because it'll break compatibility somewhere).

Even assuming such a feature becomes available in picamera, images sent "manually" to the encoder would come from memory (i.e. from the CPU to the GPU). This is slow. Sufficiently slow that I doubt you'd get 30 frames a second out of it (maybe on a Pi 2, but I'm still doubtful). So, for the moment, if you want to record an overlay I'm afraid it's something you'll need to do "off Pi". I haven't tried this myself, but I imagine things like the Elgato Game Capture HD might be a tolerably inexpensive option for doing this (i.e. don't have the Pi do any encoding; just get it to render stuff and let the Elgato do the encoding).

Finally, I'll add that the dispmanx calls @6by9 mentions sound intriguing, but if I recall correctly that involves delving into the world of OpenGL which is something I still haven't found time to learn! I don't want to pour cold water on this entirely: if there was a relatively simple way of supporting this in picamera, I'd love to add it, but at the moment I don't think that's going to happen - sorry!

6by9 commented 9 years ago

DispmanX doesn't need to involve OpenGL. vc_dispmanx_snapshot is the call of interest, and there's sample code on the forums for doing still snapshots from the screen (http://www.raspberrypi.org/forums/viewtopic.php?p=376546#p376546). Also https://github.com/AndrewFromMelbourne/raspi2png

You can also register for a vsync callback from DispmanX so you'll know when the screen is refreshed (not the same thing as the content changing, and may well be at 60Hz if you have a moderately posh TV). The sample code captures to RGB888. I believe the hardware could capture to YUV420 which will save a chunk of conversion before encoding. It can also be at a different resolution to the screen as the whole scene is rendered again (NB This steals memory bandwidth and time from the online screen rendering, so may cause tearing on the screen).

I'd suggest that at least initially it ought to be a standalone app that will run and encode anything presented to the screen, and then possibly think about integrating it into PiCamera. The overall overhead of doing all this extra composition may be too much for the Pi - it'd be a try it and see job.

(I've had plans for a long time to support a MMAL component to do overlays onto a stream, but it didn't happen whilst I could do it on company time, and my Pi time is limited now. That's not to say it'll never happen though...)

geurtlagemaat commented 9 years ago

6by8, waveform80: thank you for the explanation, pointers etc. I’m trying to understand what you are saying and dig a bit deeper in this subject. Thanks!

waveform80 commented 9 years ago

Many thanks to 6by9 for the info on the dispmanx screenshot calls. I had a look at these now, and it definitely looks like they're worth integrating into picamera. I don't think this is something I'm going to tackle for 1.10 though. First step should probably be to add a class for screenshotting with the dispmanx calls (possibly similar to the stuff in the picamera.array module). Then implementing an external app along the lines described by 6by9 above, and then finally implementing the callback mechanism in the python library (I'll have to think carefully about this last part - my hunch is something like the analysis classes in picamera.array, but we'll see).

Anyway, for now I'll mark it as an enhancement without a milestone.

chconnor commented 7 years ago

+1 -- both fingers crossed that some day this could be implemented -- for a security camera application I have a specific region that I want to mask out (a keyboard - to prevent passwords from being recorded) -- even just a specified rectangular or polygonal region to turn black would be great. Thanks for the good work.

ras-marques commented 7 years ago

+1 -- I am also very interested in some kind of mask that could be applied to the video output for similar reasons to those chconnor presented.

6by9 commented 7 years ago

Blacking out a box or two can be added to the firmware fairly easily as an extra bit of the annotate stage. I'll see what I can do.

ras-marques commented 7 years ago

Thank you! I would like to be able to select a box that would be recorded and the rest was black, maybe the other way around then.

6by9 commented 7 years ago

I would like to be able to select a box that would be recorded and the rest was black, maybe the other way around then.

Surely you're better off using zoom(x, y, w, h) to crop out the bit you're interested in and encode only that. Producing pixels of output to then render most of them back to black is wasted effort.

waveform80 commented 7 years ago

To be clear, the stuff I've been working on above is using dispmanx for efficient screenshots; it's not really (directly) to do with recording overlays (more to allow people to do static captures of overlays).

I'm going to try and find some time to document how to use the MMAL encoders fairly easily from picamera's mmalobj layer (so people can try putting the two together if they want) but for various reason, I rather doubt this'll be completed particularly soon.

chconnor commented 7 years ago

Thanks 6x9 -- that sounds great. Even just positioning the text overlays would provide a workaround until a more thorough solution, but anything will be appreciated.

ras-marques commented 7 years ago

Surely you're better off using zoom(x, y, w, h) to crop out the bit you're interested in and encode only that. Producing pixels of output to then render most of them back to black is wasted effort.

Haha! You just solved my problem, that's thinking outside... wait... inside of the box! Of course! Thank you for the suggestion. In the future it would be very useful to be able to overlay rectangles and images, but this one suggestion works for me if I can put it to work! :)

waveform80 commented 7 years ago

https://www.youtube.com/watch?v=WnnKrVSMbng

waveform80 commented 7 years ago

As the video above alludes to there's now a (low-level) fix for this. The pi-display branch contains the work for anyone wanting to test it, but beware there's no "high level" API yet - it's all down in the mmalobj layer. On the other hand, it's quite powerful...

Essentially the mmalobj layer now knows about "real" MMAL components and "fake" Python-implemented MMAL components. You can slot 'em into the pipeline like any other component and the connection will auto-negotiate formats and all that, but you get to code exactly what the component does (what formats it takes, how many outputs it has, etc).

I've made a start on something a little more high level with the PiArrayTransform class (in the array module) but while that makes it easier to construct a transform, it still requires manual manipulation of the MMAL pipeline to use. Still thinking about the really high level interface to let everyone play with this easily.

Docs are mostly done, and I've just set RTD to build from the pi-display branch so you in a bit you'll be able to see the new mmalobj chapter at http://picamera.readthedocs.io/en/pi-display/api_mmalobj.html - read through that to learn your way around the new stuff!

chconnor commented 7 years ago

Coming to this late -- great news, there, and thanks for the work! I saw in the video you said that "various factors mean it won't work at high resolutions (720p max, realistically) -- does that mean that my dream of a black polygon over a region will not be realized (since my security cams are at max resolution)?

waveform80 commented 7 years ago

Very much depends on the resolution and framerate you want. Basically, the way this works under the hood is that I'm breaking the MMAL pipeline in two and stuffing a chunk of Python in the middle. The major issue is that the unencoded format most people want to work with is RGB (or, even worse, RGBA) which is huge. Internally, the camera firmware just passes around pointers to YUV420 frames but now we're asking it to convert that frame pointer to a full RGB(A) frame, copy the whole thing over to the CPU for Python to fiddle with, then copy the whole thing back, convert it back into YUV420 and get back to the encoding.

On a Pi3, if you're reasonably careful about the sort of transforms you're doing, you can do overlays at 1280x720@30fps (as demonstrated in the video - all three overlays were composited in realtime from Python). However, that's assuming you're working in YUV420. If you're working in RGB the most I've managed so far is 800x600@30fps.

It's all down to how fast you can move the frame data around (and frankly, with unencoded frames, the answer is "not very fast" no matter what language you're working in). There is a trade-off to be made with framerate, if you want. For example, you could probably do max-resolution overlays at ... maybe 5fps at a guess (for a v1 module, probably less for a v2)?

chconnor commented 7 years ago

I take it the normal text overlay is handled in firmware, then?... any chance of hijacking that to position squares (U+25A0) at random places in a high-res frame? Maybe that's getting into the weeds a bit, but more generally it would be very handy to control the position of text overlays...

Re: MMAL pipeline -- I recently dipped my toes into cython and was amazed how well/easily it worked. My ignorance on the subject is large, but it makes me wonder if there'd be a way to use cython to write/compile the custom overlay and then you'd maybe be able to avoid the translation of the data to/from python-world because you could just pass the pointers...

waveform80 commented 7 years ago

Yes, the normal text overlay is firmware based, but as it stands there's no unicode support (just ASCII ... maybe some Latin-1 chars? I haven't tried) and no support for moving it either.

As for cython, it might speed up composition a bit ... but not that much. I'm already passing around pointers wherever I can in Python (the ctypes module allows for C-pointers to be used, constructed and manipulated in Python), and for the video I used PIL to do the compositing (which means the actual compositing got handled by a load of C). However, despite the ability to use pointers in Python (or cython), you don't get to play with the frames in the GPU's memory. The way the framework stands at the moment the only way to get frames out of it is to receive them via a callback which copies the data from the GPU's space to the CPU's.

Now, there is a zero-copy flag in MMAL which, theoretically, I could use to avoid the copying but there's a whole pile of caveats around that (largely to do with Python really wanting to own memory that it uses) which mean it's non-trivial to integrate. Maybe when I get a significant chunk of time (or someone pays for a significant chunk of my time!) I'll get to look into the possibilities of zero-copy but I'm afraid it's just not on the cards at the moment.

chconnor commented 7 years ago

Gotcha. I will continue training myself to not enter my password while my camera is recording. :-) Thanks for the responses, -c

6by9 commented 7 years ago

I take it the normal text overlay is handled in firmware, then?..

Correct. It was a debug feature for those tuning the imaging algorithms, hence why it has options for graphing exposure time, analogue gain, and focus position.

any chance of hijacking that to position squares (U+25A0) at random places in a high-res frame? Maybe that's getting into the weeds a bit, but more generally it would be very handy to control the position of text overlays...

Nope. It's not worth the effort to add general shapes to that processing path. Fixed rectangles might be sensible (as I suggested above).

There are a couple of things that may be worth investigating on this.

Using the MMAL zero copy option will reduce the memory bandwidth required for getting those buffers from GPU to/from the ARM, but it then requires the buffers to be allocated by MMAL using mmal_port_pool_create (the GPU buffer is mapped into your process memory map). I don't know whether an externally allocated buffer can easily be used within Python.
As noted, conversion to RGB is one of the heavier weight operations on the GPU. Stick with YUV if at all possible. There may be another option now though as there is a new MMAL component available - "vc.ril.isp". It supports almost all YUV or RGB formats in (as well as OPAQUE), and can spit out most YUV or RGB formats, and is using the ISP hardware block to do the conversion (and resize if you want it). The hardware obviously has a limit, and the same hardware block is being used to process the camera images, so don't go too mad.

Writing a GPU side video overlay component is still on my list of things to do, but it's well down the list.

waveform80 / picamera

Recording overlay images #196