roflcoopter / viseron

Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.
MIT License
1.66k stars 171 forks source link

EdgeTPU classification post processor #312

Closed Swiftnesses closed 1 year ago

Swiftnesses commented 2 years ago

Good day,

Currently using Frigate, but would love to try some of the other coral models available.

Example, My kids would love to know what birds are visiting the feeder (currently monitored via a UniFi camera with rtsp feeds available). Is it possible to use the MobileNet V2 model from this page?

https://coral.ai/models/image-classification/

If so, could you provide some pointers to get me started? I’ll be installing via Docker on Debian and have a PCI TPU card. The host is a NUC with Intel Quick Sync available.

Many thanks in advance.

roflcoopter commented 2 years ago

Hmm i dont think that would work, thats an image classification model, not an object detection one.

Is your camera very zoomed in on the feeder? If so it could possibly work

Swiftnesses commented 2 years ago

@roflcoopter the camera is very close to the feeding tray.

I spent the night working with OpenCV and using the bbox coordinates to resize the image to a working copy, I then pass the resized image to the classifier code. It works, but I'm not sure if it's ideal as the resized images are purely based on the detection box size and do not respect aspect or the Mobile Net 224x224 ideals.

I'm new to this (and python) but trying! If I could just work out.a simple way to resize the images based on the box size I'd be golden!

roflcoopter commented 2 years ago

Ahh so you are running object detection first, which finds a bird, then you pass that image to the classifier?

Swiftnesses commented 2 years ago

@roflcoopter trying to, yes, exactly!

Swiftnesses commented 2 years ago

I've finally got some working code, I've never written python before, I'm sure it's a nightmare!

import argparse
import time

from PIL import Image
from PIL import ImageDraw

from pycoral.adapters import common
from pycoral.adapters import detect
from pycoral.adapters import classify
from pycoral.utils.dataset import read_label_file
from pycoral.utils.edgetpu import make_interpreter

# ADDED
import requests
import numpy as np

import urllib.request
import cv2
# END ADDED

# HELPERS
def draw_objects(draw, objs, labels):
  """Draws the bounding box and label for each object."""
  for obj in objs:
    bbox = obj.bbox
    draw.rectangle([(bbox.xmin, bbox.ymin), (bbox.xmax, bbox.ymax)],
                   outline='red')
    draw.text((bbox.xmin + 10, bbox.ymin + 10),
              '%s\n%.2f' % (labels.get(obj.id, obj.id), obj.score),
              fill='red')
# END HELPERS

def main():
  parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
  parser.add_argument('-f', '--file', help='File path of image to process')
  parser.add_argument('-u', '--url', help='URL to input file')
  parser.add_argument('-dm', '--detection_model', required=True, help='File path of .tflite file')
  parser.add_argument('-dl', '--detection_labels', help='File path of labels file')
  parser.add_argument('-cm', '--classification_model', required=True, help='File path of .tflite file')
  parser.add_argument('-cl', '--classification_labels', help='File path of labels file')
  parser.add_argument('-t', '--threshold', type=float, default=0.4, help='Score threshold for detected objects')
  parser.add_argument('-o', '--output', help='File path for the result image with annotations')
  parser.add_argument('-c', '--count', type=int, default=5, help='Number of times to run inference')
  args = parser.parse_args()

  labels = read_label_file(args.detection_labels) if args.detection_labels else {}
  interpreter = make_interpreter(args.detection_model)
  interpreter.allocate_tensors()

  # BEGIN Check input variables
  if (args.file and args.url) != None:
    print('Only one input can be provided!')
    exit()

  if (args.file, args.url) == None:
    print('At least one input (--file / --url) must be provided!')
    exit()

  if args.url != None:
    print('URL provided')
    urllib.request.urlretrieve(args.url, "/data/detect.jpeg")
    image = Image.open("/data/detect.jpeg")

  if args.file != None:
    print('File location provided')

    image = Image.open(args.file)
    image.save("/data/detect.jpeg")
    image = Image.open("/data/detect.jpeg")
  # END Check input variables

  _, scale = common.set_resized_input(
      interpreter, image.size, lambda size: image.resize(size, Image.ANTIALIAS))

  print('----DETECT INFERENCE TIME----')
  print('Note: The first inference is slow because it includes',
        'loading the model into Edge TPU memory.')
  for _ in range(args.count):
    start = time.perf_counter()
    interpreter.invoke()
    inference_time = time.perf_counter() - start
    objs = detect.get_objects(interpreter, args.threshold, scale)
    print('%.2f ms' % (inference_time * 1000))

  print('-------DETECT RESULTS--------')
  if not objs:
    print('No objects detected')

  for obj in objs:
    print(labels.get(obj.id, obj.id))
    print('  id:    ', obj.id)
    print('  score: ', obj.score)
    print('  bbox:  ', obj.bbox)

    xmins = obj.bbox.xmin
    ymins = obj.bbox.ymin
    xmaxs = obj.bbox.xmax
    ymaxs = obj.bbox.ymax

    if labels.get(obj.id, obj.id) == 'bird':
        print("It's a" + " " + labels.get(obj.id, obj.id))

        image_to_crop = cv2.imread("/data/detect.jpeg")
        image_cropped = image_to_crop[ymins:ymaxs, xmins:xmaxs]
        cv2.imwrite("/data/detect_cropped.jpeg", image_cropped)

        classification("/data/detect_cropped.jpeg", args.classification_model, args.classification_labels)

  if args.output:
    image = image.convert('RGB')
    draw_objects(ImageDraw.Draw(image), objs, labels)
    image.save(args.output)
    # image.show()

def classification(img, classification_model, classification_labels):
  print(img)
  input_mean = 128.0
  input_std = 128.0
  count = 5
  top_k = 1
  threshold = 0.0
  labels = read_label_file(classification_labels) if classification_labels else {}

  interpreter = make_interpreter(*classification_model.split('@'))
  interpreter.allocate_tensors()

  # Model must be uint8 quantized
  if common.input_details(interpreter, 'dtype') != np.uint8:
    raise ValueError('Only support uint8 input type.')

  size = common.input_size(interpreter)
  image = Image.open(img).convert('RGB').resize(size, Image.ANTIALIAS)

  # Image data must go through two transforms before running inference:
  # 1. normalization: f = (input - mean) / std
  # 2. quantization: q = f / scale + zero_point
  # The following code combines the two steps as such:
  # q = (input - mean) / (std * scale) + zero_point
  # However, if std * scale equals 1, and mean - zero_point equals 0, the input
  # does not need any preprocessing (but in practice, even if the results are
  # very close to 1 and 0, it is probably okay to skip preprocessing for better
  # efficiency; we use 1e-5 below instead of absolute zero).
  params = common.input_details(interpreter, 'quantization_parameters')
  scale = params['scales']
  zero_point = params['zero_points']
  mean = input_mean
  std = input_std
  if abs(scale * std - 1) < 1e-5 and abs(mean - zero_point) < 1e-5:
    # Input data does not require preprocessing.
    common.set_input(interpreter, image)
  else:
    # Input data requires preprocessing
    normalized_input = (np.asarray(image) - mean) / (std * scale) + zero_point
    np.clip(normalized_input, 0, 255, out=normalized_input)
    common.set_input(interpreter, normalized_input.astype(np.uint8))

  # Run inference
  print('----CLASSIFICATION INFERENCE TIME----')
  print('Note: The first inference on Edge TPU is slow because it includes',
        'loading the model into Edge TPU memory.')
  for _ in range(count):
    start = time.perf_counter()
    interpreter.invoke()
    inference_time = time.perf_counter() - start
    classes = classify.get_classes(interpreter, top_k, threshold)
    print('%.1fms' % (inference_time * 1000))

  print('-------CLASSIFICATION RESULTS--------')
  for c in classes:
    print('%s: %.5f' % (labels.get(c.id, c.id), c.score))

if __name__ == '__main__':
  main()
roflcoopter commented 2 years ago

I see, that is exactly how post processors work in Viseron (face recognition for instance) When an object is detected you can mark that label (bird) to be sent to a specific post processor.

The post processor gets both the original image and the cropped bbox image to work with.

So this is basically a plug and play thing since it fits the architecture. I am however working on a huge rewrite so it will take a little while before i can implement it

roflcoopter commented 2 years ago

One issue here tho is that the TPU only supports one model at a time, so it would flip flop between object detection and classification, making it very slow.

A solution to that is to use multiple TPUs or a different object detector

Swiftnesses commented 2 years ago

@roflcoopter I have tried Viseron, but it was a little complex for me right now.

My code above seems to work reasonably well, any comments?

I don't envision too much activity, well, unless they're hungry!

All this so the kids can see what's visiting the garden 😄

roflcoopter commented 2 years ago

Looks fine to me! Ill change the issue title so i can remember to implement this as a post processor

Swiftnesses commented 2 years ago

@roflcoopter I managed to make a quick flask API, so I can send the arguments to the server and get the classification info back - not bad for a few days learning. It's slow and a bit dumb as I need to keep refreshing the snapshot url of my camera (unifi), but it works (or at least, it should, no notifications since going live!).

I was considering trying to get streaming video working, but likely too much work for me, given my knowledge!

Will lookout for a new release and transition over if / when you complete it. TA.

Swiftnesses commented 2 years ago

@roflcoopter, quick question perhaps you can help me with while I wait for the rewrite.

I've changed my code to use an rtsp feed instead of a jpeg snapshot url - it works fine.

Sadly the CPU usage is insane as it doesn't used FFMPEG VAAPI. I initialise the stream using:

cap = cv2.VideoCapture(args.rtsp_stream, cv2.CAP_FFMPEG)

Would you happen to know how to make opencv use ffmpeg "-hwaccel vaapi"?

roflcoopter commented 2 years ago

Sorry missed to reply here. You would have to build ffmpeg with hwaccel support then build OpenCV towards your custom ffmpeg

roflcoopter commented 2 years ago

@Swiftnesses not sure if you are still interested in this but i have included this now in the v2 rewrite

Swiftnesses commented 2 years ago

@roflcoopter that sounds amazing - so I'll essentially be able to use it for my bird recognition use case?

roflcoopter commented 2 years ago

Yes exactly!

Swiftnesses commented 2 years ago

Oh, this is amazing. Thank you!

Swiftnesses commented 1 year ago

@roflcoopter believe it or not, I'm just coming back to this.

It appears that I cannot use my PCI edge device (NUC) for both the object detector AND the imagine classification (and swap between them as discussed). If I set one of them to cpu, it works.

Any ideas?

Swiftnesses commented 1 year ago

OK, I have this working with the following config:

# See the README for the full list of configuration options.

ffmpeg:
  camera:
    bird_feeder:
      name: Bird Feeder
      host: 192.168.1.3
      port: 7447
      path: /REDACTED
      width: 1920
      height: 1080
      fps: 30
      # codec: h264
      # audio_codec: aac

edgetpu:
  object_detector:
    model_path: /detectors/models/custom/edgetpu/object_detection/tf2_ssd_mobilenet_v2_coco17_ptq_edgetpu.tflite
    label_path: /detectors/models/custom/edgetpu/object_detection/labels.txt
    device: pci
    cameras:
      bird_feeder:
        fps: 30
        scan_on_motion_only: false
        labels:
          - label: bird
            confidence: 0.5
            trigger_recorder: true
  image_classification:
    model_path: /detectors/models/custom/cpu/image_classification/mobilenet_v2_1.0_224_inat_bird_quant.tflite
    label_path: /detectors/models/custom/cpu/image_classification/labels.txt
    device: cpu
    cameras:
      bird_feeder:
        labels:
          - bird
    labels:
      - person

nvr:
  bird_feeder:

# MQTT is optional
mqtt:
  broker: 192.168.1.16
  port: 1883
  username: REDACTED
  password: REDACTED

The issue I'm having is the image classification is extremely poor, confidence levels never exceed 10% and are always wrong. Moving back to my code above, they're perfect.

I'm so close to replacing my terrible code - I love the MQTT implementation here too, so much potential!

Swiftnesses commented 1 year ago

I am using https://coral.ai/models/image-classification/ - iNaturalist 2017 (Birds), same as my code above.

Swiftnesses commented 1 year ago

@roflcoopter,

Looking at this further, it appears to be related to the processing of the image (my code was largely taken from Google's repo), before classification. If I skip this in my code (image processing), I also get extremely poor results.

I THINK it's related to some of this?

def classification(classification_model, classification_labels, threshold, set_output_processed_classified, set_output_processed_cropped, cropped_image):
    # Horrible workaround to check if image exists (need to fix!)
    if not cropped_image.size > 2: # np.shape(cropped_image) == () | cropped_image is None | cropped_image.size>2
        logging.info("***Shit***, the classification image isn't valid, [index: " + str(variables.detect_index) + "].")
    else:
        logging.info("Great, the classification image is valid, [index: " + str(variables.detect_index) + "].")

        input_mean = 128.0
        input_std = 128.0
        count = 1
        top_k = 1

        logging.debug("Loading {} with {} labels.".format(classification_model, classification_labels))
        interpreter_classification = make_interpreter(classification_model)
        interpreter_classification.allocate_tensors()
        labels_classification = read_label_file(classification_labels) if classification_labels else {}
        inference_size_classification = common.input_size(interpreter_classification)

        # Model must be uint8 quantized
        if common.input_details(interpreter_classification, "dtype") != np.uint8:
            raise ValueError("***Shit***, the classification model only supports uint8 input types.")

        classification_image = cv2.cvtColor(cropped_image, cv2.COLOR_BGR2RGB)
        classification_image = cv2.resize(classification_image, inference_size_classification)

        params = common.input_details(interpreter_classification, "quantization_parameters")
        scale = params["scales"]
        zero_point = params["zero_points"]
        mean = input_mean
        std = input_std
        if abs(scale * std - 1) < 1e-5 and abs(mean - zero_point) < 1e-5:
            # Input data does not require preprocessing
            common.set_input(interpreter_classification, classification_image)
        else:
            # Input data requires preprocessing
            normalized_input = (np.asarray(classification_image) - mean) / (std * scale) + zero_point
            np.clip(normalized_input, 0, 255, out=normalized_input)
            common.set_input(interpreter_classification, normalized_input.astype(np.uint8))

        # Run inference on TPU
        logging.debug("----CLASSIFICATION INFERENCE TIME----")
        logging.debug("Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.")
        for _ in range(count):
            start = time.perf_counter()
            interpreter_classification.invoke()
            inference_time = time.perf_counter() - start
            classification_objs = classify.get_classes(interpreter_classification, top_k, threshold)
            logging.debug("%.1fms" % (inference_time * 1000))
Swiftnesses commented 1 year ago

Seems related to my classification model being quantized?

roflcoopter commented 1 year ago

Thanks for your detailed report!

Very interesting findings, seems that the EdgeTPU API has changed quite a bit compared to the example i have coded against. Also this example does it a bit differently than both you and me. Might be some magic going on in the run_inference method which i am not using today.

Looking in to it!

Swiftnesses commented 1 year ago

@roflcoopter thank you :)

I noticed that my processed images are very odd colours if I save them (using the code above which was from an example somewhere). If I comment out the normalisation piece of my code they’re normal but detection is almost useless - likely not related as the example you linked doesn’t appear to do anything special, again unless it related to my model being quantised - I have no idea about that tbh.

Let me know if you want me to test anything.

roflcoopter commented 1 year ago

Its all a bit unclear to me as well. On the coral models page it says this:

Beware that the EfficientNet family of models have unique input quantization values (scale and zero-point) that you must use when preprocessing your input. For example preprocessing code, see the classify_image.py or classify_image.cc examples.

But they dont mention the INaturalist models being quantized

Swiftnesses commented 1 year ago

The only giveaway is “quant” in the model name…

Swiftnesses commented 1 year ago

That does explain why my code works - I took my code snippet from the links you used. So essentially are we saying if it’s a quantised model we should need to process it differently? Perhaps we could just have a flag to indicate the model type, or am I over simplifying things due to lack of experience?!

Swiftnesses commented 1 year ago

But then this example doesn’t do anything and seems to work and uses the same model. Urgh.

https://github.com/google-coral/edgetpu/blob/master/examples/classify_image.py

roflcoopter commented 1 year ago

But then this example doesn’t do anything and seems to work and uses the same model. Urgh.

https://github.com/google-coral/edgetpu/blob/master/examples/classify_image.py

Oh so this example works well for you?

Swiftnesses commented 1 year ago

Ha. I haven’t tried it, just took it for granted. Sent from my iPhoneOn 26 Nov 2022, at 12:17, Jesper @.***> wrote:

But then this example doesn’t do anything and seems to work and uses the same model. Urgh. https://github.com/google-coral/edgetpu/blob/master/examples/classify_image.py

Oh so this example works well for you?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you modified the open/close state.Message ID: @.***>

roflcoopter commented 1 year ago

Ahh, that example uses an old deprecated API, edgetpu, its called pycoral these days so that is not a good idea to implement. However it works fine with the code provided in https://github.com/google-coral/pycoral/blob/9972f8e/examples/classify_image.py, no need for a flag or anything since it figues out on its own if preprocessing is needed or not.

I already do other preprocessing so i just have to add this. Will fix and push soon.

Swiftnesses commented 1 year ago

Gotcha, that's basically the code I use, I see now it decides if processing is required.

Can't wait to test it!

Swiftnesses commented 1 year ago

BTW, is the classification data only output via MQTT (and subsequently to Home Assistant)? I kind of expected it to be included in the recording or snapshot, but I only see the object on snapshot and nothing on the recordings, just wondering if I'm missing something!

Really appreciate your help on this, thank you :)

roflcoopter commented 1 year ago

It is not included in the recording, post processors runs after an object is detected but is completely detached from the recorder.

So yes it is only sent over MQTT.

I just pushed #400 which will be in the dev tag shortly!

roflcoopter commented 1 year ago

The build is complete now, please test it out when you get the chance!

Swiftnesses commented 1 year ago

@roflcoopter, no luck I'm afraid - it now detects different birds and background (much more than before), but it's never the right bird! To test both my code and yours, I simply hold up picture of a European Bluetit on my phone! My code gets it 95% of the time.

roflcoopter commented 1 year ago

Hmm interesting, could you link the image you are testing?

Swiftnesses commented 1 year ago

Sure, I just used the image attached, the other picture is the resulting push notification on my phone.

IMG_5787

IMG_5786

Swiftnesses commented 1 year ago

Here is my current working code for reference if it helps:

# General
import argparse
import time
import numpy as np
import requests
from datetime import datetime

# OpenCV
import cv2

# Logging
import logging
import os
import sys
logging.basicConfig(
    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
    level=os.environ.get("LOGLEVEL", "INFO"),
    stream=sys.stdout,
)

# PYCoral requirements
from pycoral.utils.dataset import read_label_file
from pycoral.utils.edgetpu import make_interpreter
from pycoral.utils.edgetpu import run_inference
from pycoral.adapters import common
from pycoral.adapters import classify
from pycoral.adapters import detect

# Required for health check thread
from threading import Timer

# Improves capture speed
import threading
from queue import Queue

# Imported function to send email to Zapier
from send_email import send_email

# Set directory paths (relative)
script_directory = os.path.dirname(os.path.realpath(__file__))
processed_directory = "processed"

# START MAIN SCRIPT

class variables():
    bird_tracker = {}
    result_bird = None # {}
    detect_sucess = None # False
    detect_index = None # 0
    current_time = None # datetime.utcnow().strftime("%Y-%m-%d_%H-%M-%S-%f")[:-3]

# START HEARTBEAT SENDER
def heartbeat():
    Timer(60, heartbeat).start ()
    url = "http://redacted:redacted@192.168.1.16:1880/endpoint/bird_feeder"
    requests.post(url, data = {"heartbeat": "OK"})
    logging.info("Sending heartbeat to Node-RED...")
heartbeat()
# END HEARTBEAT SENDER

def main():
    default_model_dir = "/data/models/detection"

    default_detect_model = "tf2_ssd_mobilenet_v2_coco17_ptq_edgetpu.tflite"  # "tf2_ssd_mobilenet_v2_coco17_ptq_edgetpu.tflite" / "efficientdet_lite3_512_ptq_edgetpu.tflite" /
    default_detect_labels = "tf2_ssd_mobilenet_v2_coco17_ptq_edgetpu.txt"  # "tf2_ssd_mobilenet_v2_coco17_ptq_edgetpu.txt" / "efficientdet_lite3_512_ptq_edgetpu.txt" /

    default_classification_model = "/data/models/classification/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite"
    default_classification_labels = "/data/models/classification/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.txt"

    parser = argparse.ArgumentParser()
    parser.add_argument("--detect_model", help=".tflite model path", default=os.path.join(default_model_dir, default_detect_model))
    parser.add_argument("--detect_labels", help="label file path", default=os.path.join(default_model_dir, default_detect_labels))
    parser.add_argument("--classification_model", help=".tflite model path", default=os.path.join(default_model_dir, default_classification_model))
    parser.add_argument("--classification_labels", help="label file path", default=os.path.join(default_model_dir, default_classification_labels))
    parser.add_argument("--top_k", type=int, default=1, help="number of categories with highest score to display")
    parser.add_argument("--rtsp_stream", required=True, help="The full rtsp stream url")
    parser.add_argument("--detect_threshold", type=float, default=0.7, help="detect score threshold")
    parser.add_argument("--classification_threshold", type=float, default=0.7, help="classification score threshold")
    parser.add_argument("--set_output_processed", default=False, action="store_true", help="save the processed image with object boxes")
    parser.add_argument("--set_output_processed_cropped", default=False, action="store_true", help="save the processed cropped images used for classification")
    parser.add_argument("--set_output_processed_classified", default=False, action="store_true", help="save the processed classified images")
    parser.add_argument("--snooze_time", type=int, default=3600, help="set time before detecting the same species again") # default is 1 hour
    args = parser.parse_args()

    logging.debug("Loading {} with {} labels.".format(args.detect_model, args.detect_labels))
    interpreter_detect = make_interpreter(args.detect_model)
    interpreter_detect.allocate_tensors()
    labels_detect = read_label_file(args.detect_labels)
    inference_size_detect = common.input_size(interpreter_detect)

    q = Queue(maxsize = 1) # avoids a backlog in memory while processing frames, we only require the capture the latest frame...

    def receive():
        while True:
            try:
                cap = cv2.VideoCapture(args.rtsp_stream)
                ret, frame = cap.read()
                q.put(frame)
                while ret:
                    ret, frame = cap.read()
                    q.put(frame)
            except:
                logging.info("No camera FEED found... will keep trying.")
                time.sleep(5)

    def process():
        while True:
            try:
                if q.empty() != True:
                    frame = q.get()

                    cv2_im = frame

                    cv2_im_rgb = cv2.cvtColor(cv2_im, cv2.COLOR_BGR2RGB)
                    cv2_im_rgb = cv2.resize(cv2_im_rgb, inference_size_detect)
                    # cv2.imwrite("/data/streamer/tests/test_" + str(variables.current_time) + ".jpeg", frame)

                    run_inference(interpreter_detect, cv2_im_rgb.tobytes())
                    detect_objs = detect.get_objects(interpreter_detect, args.detect_threshold)

                    # Updates current time for use as timestamp in all functions
                    variables.current_time = datetime.utcnow().strftime("%Y-%m-%d_%H-%M-%S-%f")[:-3]
                    # cv2.imwrite("/data/streamer/tests/test_image_" + str(variables.current_time) + ".jpeg", frame)

                    # Create empty dictionary to hold detection / classification data
                    variables.result_bird = {"name": [], "score": [], "detect_url": [], "classification_url": [], "cropped_url": []}

                    # Reset detections
                    variables.detect_sucess = False

                    if not detect_objs:
                        logging.debug("-----------DETECT RESULT-------------")
                        logging.debug("No objects detected.")

                    # for index in range(len(detect_objs)):
                    for index, obj in enumerate(detect_objs):

                        logging.debug("-----------DETECT RESULT-------------")
                        # logging.debug(labels_detect.get(obj.id, obj.id))
                        # logging.debug('  id:    ', obj.id)
                        # logging.debug('  score: ', obj.score)
                        # logging.debug('  bbox:  ', obj.bbox)

                        if labels_detect.get(obj.id, obj.id) == "bird":
                            variables.detect_index = index

                            logging.info("Bird object detected, sending to classification model [index: " + str(variables.detect_index) + "].")

                            # Crop the image for classification
                            cropped_image = crop_image(cv2_im, inference_size_detect, obj)

                            # Run interference on TPU (using cropped image)
                            classification(args.classification_model, args.classification_labels, args.classification_threshold, args.set_output_processed_classified, args.set_output_processed_cropped, cropped_image)

                    # If required, save frame that was processed
                    if args.set_output_processed and variables.detect_sucess:
                        processed_image = append_objs_to_img(cv2_im, inference_size_detect, detect_objs, labels_detect)
                        save_processed_image(processed_image)

                    # Process results
                    if variables.detect_sucess:
                        process_results()

                    # UPDATE TRACKER DICTIONARY
                    # Create filter to check dictionary with
                    now = time.time()
                    filter_v = now - args.snooze_time
                    # Check dictionary and remove old entries
                    for k, v in list(variables.bird_tracker.items()):
                        if v <= filter_v:
                            del variables.bird_tracker[k]
                            logging.info("Removed bird species: " + "'" + k + "'" + " from the snooze list...")

            except:
                logging.info("No camera FRAME found... will keep trying.")
                time.sleep(5)

    receive_thread = threading.Thread(target=receive, daemon=True)
    process_thread = threading.Thread(target=process)

    receive_thread.start()
    process_thread.start()

    process_thread.join()

def classification(classification_model, classification_labels, threshold, set_output_processed_classified, set_output_processed_cropped, cropped_image):
    # Horrible workaround to check if image exists (need to fix!)
    if not cropped_image.size > 2: # np.shape(cropped_image) == () | cropped_image is None | cropped_image.size>2
        logging.info("***Shit***, the classification image isn't valid, [index: " + str(variables.detect_index) + "].")
    else:
        logging.info("Great, the classification image is valid, [index: " + str(variables.detect_index) + "].")

        input_mean = 128.0
        input_std = 128.0
        count = 1
        top_k = 1

        logging.debug("Loading {} with {} labels.".format(classification_model, classification_labels))
        interpreter_classification = make_interpreter(classification_model)
        interpreter_classification.allocate_tensors()
        labels_classification = read_label_file(classification_labels) if classification_labels else {}
        inference_size_classification = common.input_size(interpreter_classification)

        # Model must be uint8 quantized
        if common.input_details(interpreter_classification, "dtype") != np.uint8:
            raise ValueError("***Shit***, the classification model only supports uint8 input types.")

        classification_image = cv2.cvtColor(cropped_image, cv2.COLOR_BGR2RGB)
        classification_image = cv2.resize(classification_image, inference_size_classification)

        params = common.input_details(interpreter_classification, "quantization_parameters")
        scale = params["scales"]
        zero_point = params["zero_points"]
        mean = input_mean
        std = input_std
        if abs(scale * std - 1) < 1e-5 and abs(mean - zero_point) < 1e-5:
            # Input data does not require preprocessing
            common.set_input(interpreter_classification, classification_image)
        else:
            # Input data requires preprocessing
            normalized_input = (np.asarray(classification_image) - mean) / (std * scale) + zero_point
            np.clip(normalized_input, 0, 255, out=normalized_input)
            common.set_input(interpreter_classification, normalized_input.astype(np.uint8))

        # Run inference on TPU
        logging.debug("----CLASSIFICATION INFERENCE TIME----")
        logging.debug("Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.")
        for _ in range(count):
            start = time.perf_counter()
            interpreter_classification.invoke()
            inference_time = time.perf_counter() - start
            classification_objs = classify.get_classes(interpreter_classification, top_k, threshold)
            logging.debug("%.1fms" % (inference_time * 1000))

        logging.debug("-------CLASSIFICATION RESULT---------")
        for c in classification_objs:
            logging.debug("%s: %.5f" % (labels_classification.get(c.id, c.id), c.score))

            # Check if bird has been detected recently - Note: can cause issues with testing as multiple bird images don't often align due to this check.
            if (labels_classification.get(c.id)) not in variables.bird_tracker:

                # Process only if threshold is acceptable
                if c.score >= threshold:
                    # Convert 0.xxxxxxxx to xx
                    percent = str(int(100 * c.score)) + "%"
                    label = str(labels_classification.get(c.id))

                    logging.info("Successful classification: " + label + " @" + percent + " accuracy")

                    # Use to track if a succesful classification has occured
                    variables.detect_sucess = True

                    # Track specicies during this cycle
                    variables.result_bird["name"].append(label)
                    variables.result_bird["score"].append(percent)
                    # Track found bird to manage snoozes
                    variables.bird_tracker[label] = time.time()
                    logging.info("Adding bird species: " + "'" + label + "'" + " to the snooze list...")

                    # Save classification image
                    if set_output_processed_classified:
                        save_classification_image(classification_image, "classification", label, percent)

                    # Save cropped image
                    if set_output_processed_cropped:
                        save_classification_image(cropped_image, "cropped", label, percent)

            else:
                logging.info("Bird species recently detected, snoozing...")

def process_results():
    if variables.result_bird == {"name": [], "score": [], "detect_url": [], "classification_url": [], "cropped_url": []}:
        logging.debug("Cycle finished, no birds found.")
    else:
        logging.info("Cycle finished, found the following birds:\n" + str(variables.result_bird))

        # Send information to Zapier via email
        for k,v in variables.result_bird.items():
            # print(k)
            if k == 'name':
                species_number = len(v)
                species = (v)
            if k == 'score':
                score_number = len(v)
                score = (v)
            if k == 'detect_url':
                image_url = v[0]

        species = (str(species).strip('[]').replace('\'', '').replace(',', ' &'))
        score = (str(score).strip('[]').replace('\'', '').replace(',', ' &'))#print(score)
        body = "Number: {}\nSpecies: {}\nAccuracy: {}".format(species_number, species, score)

        send_email("redacted@robot.zapier.com", "New bird(s) detected!", body, image_url)

        # Send information to Node Red
        files = {'upload_file': open(image_url,'rb')}
        url = "http://redacted:redacted@192.168.1.16:1880/endpoint/bird_feeder"

        requests.post(url, files = files, data = variables.result_bird)

# START IMAGE HELPERS
def append_objs_to_img(image, inference_size_detect, objs, labels):
    logging.info("Appending labels to detect image [in memory]...")
    height, width, channels = image.shape
    scale_x, scale_y = width / inference_size_detect[0], height / inference_size_detect[1]
    for obj in objs:
        bbox = obj.bbox.scale(scale_x, scale_y)
        x0, y0 = int(bbox.xmin), int(bbox.ymin)
        x1, y1 = int(bbox.xmax), int(bbox.ymax)

        percent = int(100 * obj.score)
        label = "{}% {}".format(percent, labels.get(obj.id, obj.id))

        image = cv2.rectangle(image, (x0, y0), (x1, y1), (0, 255, 0), 2)
        image = cv2.putText(image, label, (x0, y0 + 30), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 0, 0), 2)
    return image

def crop_image(image, inference_size_detect, obj):
    logging.info("Cropping detect image ready classification [in memory]...")
    height, width, channels = image.shape
    scale_x, scale_y = width / inference_size_detect[0], height / inference_size_detect[1]
    bbox = obj.bbox.scale(scale_x, scale_y)
    x0, y0 = int(bbox.xmin), int(bbox.ymin)
    x1, y1 = int(bbox.xmax), int(bbox.ymax)

    crop_correction = 0
    image = image[y0 - crop_correction : y1 + crop_correction, x0 - crop_correction : x1 + crop_correction]
    return image

def save_processed_image(image):
    logging.info("Saving detect image [to disk]...")
    detect_filename = "detect_" + str(variables.current_time) + ".jpeg"
    detect_filepath = os.path.join(script_directory, processed_directory, detect_filename)
    variables.result_bird["detect_url"].append(detect_filepath)
    cv2.imwrite(detect_filepath, image)

def save_classification_image (image, image_type, label, percent):
    logging.info("Saving " + image_type + " image [to disk]...")
    image_filename = image_type + "_" + str(variables.current_time) + "_[index_" + str(variables.detect_index) + "]_" + label + "_" + percent + ".jpeg"
    image_filepath = os.path.join(script_directory, processed_directory, image_filename)
    variables.result_bird[image_type + "_url"].append(image_filepath)
    cv2.imwrite(image_filepath, image)
# END IMAGE HELPERS

if __name__ == "__main__":
    main()
Swiftnesses commented 1 year ago

Here's my current Viseron config too:

# See the README for the full list of configuration options.

logger:
  # default_level: debug
  # logs:
    # viseron.components.ffmpeg: debug
    # viseron.components.edgetpu: debug
  # cameras:
    # bird_feeder: debug

ffmpeg:
  camera:
    viseron_bird_feeder:
      name: Viseron - Bird Feeder
      host: 192.168.1.3
      port: 7447
      path: /redacted
      # width: 1920
      # height: 1080
      # fps: 30
      # codec: h264
      # audio_codec: aac

edgetpu:
  object_detector:
    model_path: /detectors/models/custom/edgetpu/object_detection/tf2_ssd_mobilenet_v2_coco17_ptq_edgetpu.tflite
    label_path: /detectors/models/custom/edgetpu/object_detection/labels.txt
    device: pci
    cameras:
      viseron_bird_feeder:
        # log_all_objects: true
        fps: 10
        scan_on_motion_only: false
        labels:
          - label: bird
            confidence: 0.6
            trigger_recorder: true
  image_classification:
    model_path: /detectors/models/custom/cpu/image_classification/mobilenet_v2_1.0_224_inat_bird_quant.tflite
    label_path: /detectors/models/custom/cpu/image_classification/labels.txt
    device: cpu
    cameras:
      viseron_bird_feeder:
        labels:
          - bird
    # labels:
      # - person

nvr:
  viseron_bird_feeder:

# MQTT is optional
mqtt:
  broker: 192.168.1.16
  port: 1883
  username: redacted
  password: redacted
  home_assistant:
    discovery_prefix: homeassistant
    retain_config: true
Swiftnesses commented 1 year ago
Screenshot 2022-11-27 at 09 03 11

Good morning.

Can you explain these stats to me - I'm a little confused why I see this FPS per camera, and not for the Viseron platform?

How do I know if the edgetpu is overloaded? Currently I have 6 cameras, most at 10fps, one at 30fps.

roflcoopter commented 1 year ago

Those stats are for the object detector, not for the camera. If the EdgeTPU was overloaded those numbers would decrease.

If you use the same object detector for multiple cameras they will have the same values.

I am trying really hard with the image classification btw but i cant seem to figure out whats wrong :( Will keep looking

Swiftnesses commented 1 year ago

I've been working hard on my config, now setup substreams and have all 12 cameras added!

Fingers crossed you can get classification working, have you tested my code and confirmed it works for you btw?

roflcoopter commented 1 year ago

Wow 12 cameras, thats a lot!

No i have not yet, i have been double checking everything and it seems correct. Trying to setup a good test environment atm but i dont have any spare cameras to use at the computer so need to look at alternatives

Swiftnesses commented 1 year ago

I use my Unifi Protect rtsp streams, works perfect. Just started to use the substream option on a lower resolution stream, seems to work well.

I'm still struggling ti understand the fps indicator, if I lower the objector fps on all my feeds, it goes up. Vice Versa. I'm trying to understand how I understand I'm at the limit... Sorry if that sounds dumb!

Swiftnesses commented 1 year ago
Screenshot 2022-11-27 at 09 55 34

I also think this HA sensor should be type number or statistic (better), so we can see graph data :)

Swiftnesses commented 1 year ago

Adding temp for the USB and PCIE edgetpu would also be awesome!

I have a pcie card and I trigger cat /sys/class/apex/apex_0/temp

Swiftnesses commented 1 year ago

Any luck @roflcoopter?

roflcoopter commented 1 year ago

Ran out of time today, will take a stab at it again during the week!

Regarding your other questions, the fps indicator shows at what fps the inference is working at, to give you an indication of how fast a particular model is for instance. A higher number means the inference is very fast, lower numbers means you are getting closer to max performance.

And yes it should definitely be a number, will fix

roflcoopter commented 1 year ago

Doesnt seem to matter how i twist and turn my code, the confidence levels are always very low.

Could you give me an example of how you invoke your script and i can try and see if i get similar results?

Swiftnesses commented 1 year ago

I just run the code above, and feed it with an RTSP feed (arg), I then use my phone to show the picture linked above. Confidence is around 80% first time.