[SUGGESTION] Any plans to implement the picamera "annotate" functionality?

jaysonlarose commented 2 years ago

The picamera library had support for "baking in" text to the output image via firmware using the annotate_text property. Are there any plans to reimplement this ability in picamera2?

I use this functionality — as well as a little brute force and ignorance — to apply timestamps with 1/100th second precision to video. I'd basically push the updated timestamp text to the camera module 100 times per second as part of my main event loop, and let the chips fall where they may as far as which frames received what timestamp.

I tried to reimplement the same thing using a technique similar to this last night on the same Raspberry Pi 3 Model B, and CPU usage exceeded 100% and capture at 25+ frames per second was no longer possible.

One caveat with the above statement: I had to implement the code that writes the timestamp text to the current video frame by rendering via freetype and blitting via numpy, because raspberry pi os lite bullseye's gtk-3 is currently broken. Until that's fixed I won't know how performance of cv2's text rendering compares, but I'm concerned about the performance, as my example code running at 1280x720@30fps produced 5 minutes 5 seconds worth of frames during 10 minutes worth of recording and consumed 110% CPU while doing it; in contrast, the old annotate method worked fine at 1600x1200@30fps, consuming 15% cpu while also serving up the video stream using TLS encryption.

Here's the example code I cobbled together that uses freetype while the gtk3 packages are broken:

#!/usr/bin/python3
import time

# cv2 disabled because:
#$ date
#Tue 11 Oct 2022 10:43:46 AM PDT

#$ printf "%s\n" "$( cat /proc/device-tree/model )"
#-bash: warning: command substitution: ignored null byte in input
#Raspberry Pi 3 Model B Rev 1.2

#$ lsb_release -a
#No LSB modules are available.
#Distributor ID: Raspbian
#Description:    Raspbian GNU/Linux 11 (bullseye)
#Release:        11
#Codename:       bullseye

#$ sudo apt install python3-opencv
#Reading package lists... Done
#Building dependency tree... Done
#Reading state information... Done
#Some packages could not be installed. This may mean that you have
#requested an impossible situation or if you are using the unstable
#distribution that some required packages have not yet been created
#or been moved out of Incoming.
#The following information may help to resolve the situation:
#
#The following packages have unmet dependencies:
# libgtk-3-0 : Depends: libwayland-client0 (>= 1.20.0) but 1.18.0-2~exp1.1 is to be installed
#E: Unable to correct problems, you have held broken packages.

#import cv2

import os

# NOTE: point this to a .ttf file that actually exists on your system!
FONT_FILE = os.path.join(os.environ['HOME'], ".local/share/fonts/alarm_clock_7mod.ttf")
FONT_SIZE = 16

from picamera2 import MappedArray, Picamera2
from picamera2.encoders import H264Encoder
from picamera2.outputs import FileOutput

# (blit() and paste() lifted from https://stackoverflow.com/questions/28676187/numpy-blit-copy-part-of-an-array-to-another-one-with-a-different-size)

import numpy as np

def blit(a, b, offsets=(0,), as_shapes=False):
    """
    Computes the slices of the overlapping regions of arrays <a> and <b>. If offsets are specified,
    <b> will be shifted by these offsets before computing the overlap.

    Example:
          50
       ┌──────┐
       │      │
     65│  ┌───┼────┐
       │  │   │    │50
       └──┼───┘    │
          └────────┘
              55
    <a> is the 65x50 array and <b> is the 50x55 array. The offsets are (32, 18). The returned
    slices are [32:65, 18:50] for <a> and [0:33, 0:32] for <b>.

    Arrays of different dimensions can be used (e.g. 3-dimensional RGB image and 2-dimensional
    grayscale image) but the slices will only go up to min(a.ndim, b.ndim). An offset with more
    elements than that will throw a ValueException.

    Instead of arrays, shapes can be directly passed to the function by setting as_shapes to True.

    :param a: an array object or a tuple is as_shape is True
    :param b: an array object or a tuple is as_shape is True
    :param offsets: a sequence of offsets
    :param as_shapes: if True, <a> and <b> are expected to be array shapes rather than array
    :return: a multidimensional slice for <a> followed by a multidimensional slice for <b>
    """

    # Retrieve and check the array shapes and offset
    if not as_shapes:
        a, b = np.array(a, copy=False), np.array(b, copy=False)
        a_shape, b_shape = a.shape, b.shape
    else:
        a_shape, b_shape = a, b
    n = min(len(a_shape), len(b_shape))
    if n == 0:
        raise ValueError("Cannot overlap with an empty array")
    offsets = tuple(offsets) + (0,) * (n - len(offsets))
    if len(offsets) > n:
        raise ValueError("Offset has more elements than either number of dimensions of the arrays")

    # Compute the slices
    a_slices, b_slices = [], []
    for i, (a_size, b_size, offset) in enumerate(zip(a_shape, b_shape, offsets)):
        a_min = max(0, offset)
        a_max = min(a_size, max(b_size + offset, 0))
        b_min = max(0, -offset)
        b_max = min(b_size, max(a_size - offset, 0))
        a_slices.append(slice(a_min, a_max))
        b_slices.append(slice(b_min, b_max))

    return tuple(a_slices), tuple(b_slices)

def paste(a, b, offsets=(0,), copy=True):
    """
    Pastes array <b> into array <a> at position <offsets>

    :param a: an array object
    :param b: an array object
    :param offsets: the position in <a> at which <b> is to be pasted
    :param copy: whether to paste <b> in <a> or in a copy of <a>
    :return: either <a> or a copy of <a> with <b> pasted on it
    """

    out = np.array(a, copy=copy)
    a_slice, b_slice = blit(a, b, offsets)
    out[a_slice] = b[b_slice]
    return out

import numpy
import freetype

def render_numpy(face, text, grayscale=True):
        flags = freetype.FT_LOAD_RENDER
        if not grayscale:
                flags |= freetype.FT_LOAD_TARGET_MONO
        pen = freetype.FT_Vector(0, 0)
        xmin, xmax = 0, 0
        ymin, ymax = 0, 0
        # Previous character, used for kerning
        previous = 0
        for char in text:
                face.load_char(char, flags)
                kerning = face.get_kerning(previous, char)
                previous = char
                pen.x += kerning.x
                x0 = (pen.x >> 6) + face.glyph.bitmap_left
                x1 = x0 + face.glyph.bitmap.width
                y0 = (pen.y >> 6) - (face.glyph.bitmap.rows - face.glyph.bitmap_top)
                y1 = y0 + face.glyph.bitmap.rows
                xmin, xmax = min(xmin, x0), max(xmax, x1)
                ymin, ymax = min(ymin, y0), max(ymax, y1)
                pen.x += face.glyph.advance.x
                pen.y += face.glyph.advance.y

        canvas = numpy.zeros((ymax - ymin, xmax - xmin), dtype=numpy.ubyte)
        previous = 0
        pen.x, pen.y = (0, 0)
        for char in text:
                face.load_char(char, flags)
                kerning = face.get_kerning(previous, char)
                previous = char
                pen.x += kerning.x
                x = (pen.x >> 6) - xmin + face.glyph.bitmap_left
                y = (pen.y >> 6) - ymin - (face.glyph.bitmap.rows - face.glyph.bitmap_top)
                data = []
                for i in range(face.glyph.bitmap.rows):
                        if not grayscale:
                                row = []
                                for j in range(face.glyph.bitmap.pitch):
                                        row.extend(bits(face.glyph.bitmap.buffer[i * face.glyph.bitmap.pitch + j]))
                                data.extend(row[:face.glyph.bitmap.width])
                        else:
                                data.extend(face.glyph.bitmap.buffer[i * face.glyph.bitmap.pitch:i * face.glyph.bitmap.pitch + face.glyph.bitmap.width])
                if len(data):
                        Z = numpy.array(data, dtype=numpy.ubyte).reshape(face.glyph.bitmap.rows, face.glyph.bitmap.width)
                        canvas[y:y + face.glyph.bitmap.rows, x:x + face.glyph.bitmap.width] |= Z[::-1, ::1]
                pen.x += face.glyph.advance.x
                pen.y += face.glyph.advance.y
        canvas = numpy.flip(canvas, 0)
        canvas = numpy.repeat(canvas.reshape(-1), 3).reshape(*canvas.shape, 3)
        return canvas

def format_timestamp(dt, omit_tz=False, alt_tz=False, precision=6):# {{{
        # doc {{{
        """\
        Takes a timezone-aware datetime object and makes it look like:

        2019-01-21 14:38:21.123456 PST

        Or, if you call it with omit_tz=True:

        2019-01-21 14:38:21.123456

        The precision parameter controls how many digits past the decimal point you
        get. 6 gives you all the microseconds, 0 avoids the decimal point altogether
        and you just get whole seconds.
        """
        # }}}
        tz_format = "%Z"
        if alt_tz:
                tz_format = "%z"
        timestamp_txt = dt.strftime("%F %T")
        if precision > 0:
                timestamp_txt = "{}.{}".format(timestamp_txt, "{:06d}".format(dt.microsecond)[:precision])
        if not omit_tz and dt.tzinfo is not None:
                timestamp_txt = "{} {}".format(timestamp_txt, dt.strftime("%z"))
        return timestamp_txt
# }}}

def now_tzaware():# {{{
        # doc {{{
        """
        Convenience function, equivalent to
        `datetime.datetime.now(tz=pytz.reference.Local)`
        """
        # }}}
        import pytz.reference, datetime
        return datetime.datetime.now(tz=pytz.reference.Local)
# }}}

if __name__ == '__main__':
    picam2 = Picamera2()
    picam2.configure(picam2.create_video_configuration())

    #colour = (0, 255, 0)
    #origin = (0, 30)
    #font = cv2.FONT_HERSHEY_SIMPLEX
    #scale = 1
    #thickness = 2

    face = freetype.Face(FONT_FILE)
    face.set_char_size(FONT_SIZE * 64)

    def apply_timestamp(request):
        timestamp_text = format_timestamp(now_tzaware(), precision=2)
        canvas = render_numpy(face, timestamp_text)
        timestamp = time.strftime("%Y-%m-%d %X")
        with MappedArray(request, "main") as m:
            #cv2.putText(m.array, timestamp, origin, font, scale, colour, thickness)
            paste(m.array, canvas, (10, 10), copy=False)

    picam2.pre_callback = apply_timestamp

    encoder = H264Encoder(10000000)

    picam2.start_recording(encoder, "test.h264")
    time.sleep(600)
    picam2.stop_recording()

I assume a Raspberry Pi 4 is performant enough to run this in real-time, but the Raspberry Pi 3 is not, so I'm concerned about the claim that the legacy interface is going to be removed while the new library isn't capable of doing the same thing.

davidplowman commented 2 years ago

Hi, I've never been very happy with the Python libraries that I've found for rendering text onto images, though the least bad option seems to be OpenCV, which has always worked fine for me. If you're having trouble with this, then please report it on one of the forums, obviously it will never get fixed otherwise! Can you say if there is anything "unusual" about your configuration? Does it work on a clean install of the latest Bullseye image with no other modifications?

Another thing I contemplated was using PIL. This will render fonts easily enough but I could never get it to work "in place", so I had to copy out the piece of image where the text will go, write the text, and then copy it back. Might that work better?

jaysonlarose commented 2 years ago

I didn't bother reporting it, because it seemed to be a pretty widespread issue: https://forums.raspberrypi.com/viewtopic.php?t=340631. I was trying to install on a brand new installation of Bullseye. Anyways, I gave things another go last night, this package problem is apparently sorted out, because I was able to get python3-opencv installed without a hitch this time around.

cv2.putText would appear to be a LOT more performant than my freetype + numpy blitting method. Processing a 1600x1200@30fps Raspberry Pi Camera 2 feed on my Raspberry Pi 3 model B resulted in an average CPU usage (as reported by top) of about 60% using cv2.putText. Disabling the camera.pre_callback hook resulted in an average CPU usage of about 40%. Nowhere near the 15% performance I got using the old legacy module and picamera, but the only things this raspberry pi needs to do is serve up video and process GPS data for an NTP server, so that's fine.

raspberrypi / picamera2

[SUGGESTION] Any plans to implement the picamera "annotate" functionality? #350