snowzach / doods

DOODS - Dedicated Open Object Detection Service
MIT License
303 stars 33 forks source link

Support getting image from a file #17

Closed zaneclaes closed 3 years ago

zaneclaes commented 4 years ago

Loving doods so far.

My goal is to have ~10 cameras each doing an image detection every ~1 second (some cameras looking for wildlife, others cars, others people). I have plenty of CPU cycles sitting unused, so I spun up 10 doods containers on my K8s cluster. I used the Home Assistant integration and set scan_interval: 2 (two seconds). Everything works well enough, but I'm wondering if I can further optimize by making the detectors access files directly ā€” right now, I appear to be using ~500 Kbps in constant bandwidth.

Screen Shot 2020-05-01 at 9 28 12 AM

I'm already running MotionEye upstream of doods, so it's easy for me to create a still .jpg image every 1 second for doods to consume. And I know doods can simply cat a local file, which could be accessible to it through a shared mounted volume with MotionEye. My question has more to do with the Home Assistant side. Is there an easy way to switch over the detector integration to use this approach? Or am I looking at what amounts to a rewrite of the custom component?

snowzach commented 4 years ago

I am not totally sure what you mean? You can create a camera option in home assistant and reference a file and then tell home assistant to scan every second and it should read that file. It might be kinda hard in case it tries to read while it's writing the file. DOODS itself is just an API. You might get better help posting in the Home Assistant forums.

zaneclaes commented 4 years ago

I'm referring to this configuration (docs):

image_processing:  
  - platform: doods
    url: "http://doods.home.svc.cluster.local:8080"
    [...]

Please correct me if I am wrong @snowzach , but I was under the impression that you are the maintainer of the doods integration for the image_processing component. The problem is the use of a url to transmit the file contents to DOODS, which is extremely taxing on the network and is causing the processing to fail regularly. Compare this against, say, the TensorFlow integration which is capable of doing everything with local file references (no network call).

And yes, I understand what you're saying. I can think of a few ways I might manually configure DOODS to scan files, but that doesn't solve the integration problem AFAICT. In other words, I need to use the image_processing component because that's the "correct" way for Home Assistant to, uh, process images. But if the platform: doods only supports a url instead of, say, a folder to watch... then it won't scale to my deployment.

snowzach commented 4 years ago

Hey @zaneclaes, DOODS is designed as a service and uses an API. If it worked on watching a folder it would be pretty difficult to interact with a program. If I wrote 3 files to a directory, how do I get the response back or correlate which response goes with which image.

The tensorflow component and the DOODS component in home assistant work exactly the same. They take image data from something like a camera component and forward it to the object detection component and get back data that can be written to a file along with detection data. The tensorflow component cannot read files, only other home assistant components for input. (You can use a camera component and read a file that way the same for DOODs or Tensorflow) The only difference is that the tensorflow component is integrated into home assistant and the DOODS component communicates with an API.

If you run the DOODS component on the same machine as home assistant there really is no network overhead. It's all internal to the same machine.

If you

zaneclaes commented 4 years ago

If you run the DOODS component on the same machine as home assistant there really is no network overhead. It's all internal to the same machine.

In my screenshot from above showing the network traffic, doods and Home Assistant (containers) are running on the same machine. While it's true it's faster on the same machine, the data still goes through TCP/IP and suffers all of the inherent latency therein. Plus, network bandwidth is not free; it necessarily impacts the performance of other containers. Using distributed tracing (OpenTelemetry), I can confirm that 50-70% of the total processing time is spent transmitting data back and forth from doods. This can also be confirmed by the fact that Home Assistant is timing out on 2-3 second update intervals, while the doods logs themselves show a processing time never greater than ~0.75 seconds.

Doods log:

{"level":"info","ts":1589894137.5808613,"caller":"tflite/detector.go:431","msg":"Detection Complete","package":"detector.tflite","name":"default","id":"","duration":0.492695043,"detections":0,"device":null}

Corresponding Home Assistant log:

Updating doods image_processing took longer than the scheduled update interval 0:00:02

If it worked on watching a folder it would be pretty difficult to interact with a program. If I wrote 3 files to a directory, how do I get the response back or correlate which response goes with which image.

Instead of transmitting the entire binary contents of the file back and forth to doods as part of the API call/response, the API can accept/respond with the name of the file. Nothing else needs to change. To use the analogy of a C++ pointer, right now we're doing memcpy instead of just passing-by-reference.

snowzach commented 4 years ago

Are you running this through flannel as well? 50%-70% of the processing time in network transit seems pretty high to me when traveling on the same host. I don't see how it could possibly take 1.5 seconds to send the data unless it's a huge image or there is something wrong. The detection time doesn't actually take into account if the image needs to be converted. I'll update that eventually.

With that said, it could potentially support a file. Both this project and https://github.com/snowzach/pydoods would need to be updated. I welcome a pull request. I'm not going to have time to do it for a while.

zaneclaes commented 4 years ago

Thanks for taking the time, btw ā€” I had hoped this was a simple solution, but I might have some time to try to help with a file-based approach. Not today, but perhaps soon ;)

I am running Flannel on my home k8s cluster, yes. It certainly doesn't fail all the time, but it's enough to show up hundreds of times in my HA logs. I think the key may be that I have 10 cameras which I'm trying to achieve near real-time detection with. Plus, a k8s cluster with ~8 hosts, running dozens of other containers as well. I'm guessing the errors happen when several things are competing for bandwidth at once...

MYeager1967 commented 3 years ago

I'm trying to access the API from python. Here's what I'm trying to do and I'm getting 400 back as a response.

    newHeaders = {'Content-type': 'application/json'}
    response = requests.post("http://192.168.0.4:8080/detect", data = {"detector_name":"default", "detect":{"*":60}, "data":"/home/homeassistant/snapshots/latest-garage.jpg|base64"}, headers=newHeaders)
    self.log(response.status_code)

I pieced this together from the Wiki but I really am grasping at straws. I know I get a JSON response when it works. Can I also retrieve the image somehow?

snowzach commented 3 years ago

You need to load the image into a variable and then base64 it and then pass it in data. I'm not near a computer now but can work something out in the next day or so if you don't get it.

MYeager1967 commented 3 years ago

I replied to the other comment as well. If you could help figure it out, it's kicking my ass. Importing a library is difficult in the environment I'm working in and every time it updated things would break.

snowzach commented 3 years ago

I think this will do it


import base64

with open("yourfile.jpg", "rb") as image_file:
    image_data = base64.b64encode(image_file.read())

newHeaders = {'Content-type': 'application/json'}
    response = requests.post("http://192.168.0.4:8080/detect", data = {"detector_name":"default", "detect":{"*":60}, "data":image_data}, headers=newHeaders)
    self.log(response.status_code)
MYeager1967 commented 3 years ago

I think that's something close to what I have but I can't verify it at the moment. I'm away for a few days myself. šŸ˜€

I know this is probably a dumb question, but is there a way to change the color of the bounding boxes and the size of the label font or turn it off?

MYeager1967 commented 3 years ago

No joy. Here's the response I get back with a code of 400.

{"error":"invalid character 'd' looking for beginning of value","code":3,"message":"invalid character 'd' looking for beginning of value"}

Would be nice to figure this out as I could take one more step out of the loop. Did you happen to see my question about the labels and bounding boxes?

snowzach commented 3 years ago

This will do it..

import base64
import requests
import json

with open("/data/git/video/doods/front_yard.jpg", "rb") as image_file:
    image_data = base64.b64encode(image_file.read())

response = requests.post("http://192.168.0.4:8080/detect", data=json.dumps({"detector_name":"default", "detect":{"*":60}, "data":image_data.decode('ascii')}), headers={'Content-type': 'application/json'})
print(response.json())

The boxes are drawn by home assistant. Doods only returns the data to tell you where to draw the boxes.

MYeager1967 commented 3 years ago

I'll give this a shot when I get a chance to sit down at the computer. Not sure what you changed but I appreciate the effort. I did not realize that DOODS (Tensorflow) didn't handle the bounding boxes. I figured it processed the whole thing...

On Sat, Aug 1, 2020, 10:27 PM Zach notifications@github.com wrote:

This will do it..

import base64 import requests import json

with open("/data/git/video/doods/front_yard.jpg", "rb") as image_file: image_data = base64.b64encode(image_file.read())

response = requests.post("http://localhass.prozach.org:8087/detect", data=json.dumps({"detector_name":"default", "detect":{"*":60}, "data":image_data.decode('ascii')}), headers={'Content-type': 'application/json'}) print(response.json())

The boxes are drawn by home assistant. Doods only returns the data to tell you where to draw the boxes.

ā€” You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/snowzach/doods/issues/17#issuecomment-667614835, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGBF5TYY5PWJ7KDN34XQKLR6TFJRANCNFSM4MXFM2YQ .

zaneclaes commented 3 years ago

He added the .decode('ascii'). The error you saw was due to the fact that b64encode returns a bytes-like object rather than a string.

FWIW, here's a snippet of code I use to call DOODS. It resizes to 300x300 before sending, which reduces errors and overall transport size. This was actually the solution to the original problem I raised in this thread. By sending properly sized images, the whole process sped up immensely. Plus, now I have a Google Coral, and each detection takes < 0.05 seconds.

This code is part of a "recorder" I made, which saves animated gifs of any sequence of frames that detect something.


async def _detect(self, fn):
    """Call the detection API with an image file"""
    try:
        image = Image.open(fn).resize((300, 300))
        buffered = BytesIO()
        image.save(buffered, format="JPEG")
        data = base64.b64encode(buffered.getvalue()).decode('utf-8')

        body = json.dumps({
            "detector_name": self._detector_name,
            "data": data,
            "detect": self._detector_opts
        })
        res = await self._async(None, self._session.post, self._endpoint, body)
        _LOGGER.info(f'motion record detect response [{res.status_code}]: {res.text}')
        if res.status_code != 200:
            _LOGGER.error(f'motion record detect failed [{res.status_code}]: {res.text}')
            return None
        res = res.json()

        if not res: return None

        cnt = 0
        if not 'detections' in res: return cnt

        for det in res['detections']:
            det['frame'] = self.num_frames
            lbl = det['label']
            con = float(det['confidence'])
            _LOGGER.info(f"motion record detect {fn}: {lbl} @ {int(con)}%")
            if not lbl in self._detections: self._detections[lbl] = []
            self._detections[lbl].append(det)

            if con > self._max_confidence:
                # New "best label" for this recording.
                self._max_confidence = con
                self._label = lbl
            self._count += 1

        return cnt
    except Exception as e:
        _LOGGER.warning(f'motion record detect {type(e)}: {e}')
        return None
MYeager1967 commented 3 years ago

Nice. I should have caught that as I had it on an earlier attempt at getting this to work. Too early in the morning for me I suppose.

I send the full size image as I want the detail on the detected image. I'm quite impressed by the speed and accuracy of the program as it is.

On Sun, Aug 2, 2020, 8:37 AM Zane Claes notifications@github.com wrote:

He added the .decode('ascii'). The error you saw was due to the fact that b64encode returns a bytes-like object rather than a string. FWIW, here's a snippet of code I use to call DOODS. It resizes to 300x300 before sending, which reduces errors and overall transport size.

async def _detect(self, fn): """Call the detection API with an image file""" try: image = Image.open(fn).resize((300, 300)) buffered = BytesIO() image.save(buffered, format="JPEG") data = base64.b64encode(buffered.getvalue()).decode('utf-8')

    body = json.dumps({
        "detector_name": self._detector_name,
        "data": data,
        "detect": self._detector_opts
    })
    res = await self._async(None, self._session.post, self._endpoint, body)
    _LOGGER.info(f'motion record detect response [{res.status_code}]: {res.text}')
    if res.status_code != 200:
        _LOGGER.error(f'motion record detect failed [{res.status_code}]: {res.text}')
        return None
    res = res.json()

    if not res: return None

    cnt = 0
    if not 'detections' in res: return cnt

    for det in res['detections']:
        det['frame'] = self.num_frames
        lbl = det['label']
        con = float(det['confidence'])
        _LOGGER.info(f"motion record detect {fn}: {lbl} @ {int(con)}%")
        if not lbl in self._detections: self._detections[lbl] = []
        self._detections[lbl].append(det)

        if con > self._max_confidence:
            # New "best label" for this recording.
            self._max_confidence = con
            self._label = lbl
        self._count += 1

    return cnt
except Exception as e:
    _LOGGER.warning(f'motion record detect {type(e)}: {e}')
    return None

ā€” You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/snowzach/doods/issues/17#issuecomment-667668934, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGBF5QX3TPYM3RCFKNPROLR6VMW5ANCNFSM4MXFM2YQ .

zaneclaes commented 3 years ago

I send the full size image

Have you considered the fact that DOODS resizes the images down to the appropriate detection dimensions? Using the pre-trained models, the max resolution you can detect is 300x300.

I want the detail on the detected image.

You can keep the original, as I do -- just send a resized version so you're not wasting bandwidth/time on the DOODS side.

snowzach commented 3 years ago

It depends on the model. Most of the tflite models are lower resolution. The tensowflow inception model will use the full resolution images fwiw

MYeager1967 commented 3 years ago

I use the full inception model but even if I didn't, I don't think I'd gain much by doing the conversion myself. Not too concerned with the transport time as it's local. I see there are now inception models that are compiled for the Coral stick. I stay away from the tflite models because the accuracy is horrible. But things keep evolving...

zaneclaes commented 3 years ago

Interesting ā€” This is one of the next things I wanted to look in to. Now that I have the Coral, Iā€™m wondering if there are good models I can use that will do high res detection. I only care about a small number of labels (person, car, dog, etc). But I assume a model which can accept higher resolution will yield better results...

On Sun, Aug 2, 2020 at 7:02 AM Zach notifications@github.com wrote:

It depends on the model. Most of the tflite models are lower resolution. The tensowflow inception model will use the full resolution images fwiw

ā€” You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/snowzach/doods/issues/17#issuecomment-667671482, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAP47VOWO5OUYMFBWS7OP2DR6VPXDANCNFSM4MXFM2YQ .

MYeager1967 commented 3 years ago

The support page for the Coral lists many different models. Not sure of the resolution as it may be limited by the underlying tflite engine. Of the full tensorflow engine ever supports the Coral stick, I'll buy two. As it is, I'm processing a new image every few seconds (on demand, not all the time) with no issues. I've had it running 4 cameras at once with good results. I'm running motionEye and DOODS, along with a few other very low resource dockerized programs, on an older i7 Lenovo ThinkStation. It's a tiny little box about the size of a standard home router. Churning 4 cameras and recording video on all 4, I see about 70% CPU usage. I'm only using motionEye to record though, actual motion detection is done by the cameras.

MYeager1967 commented 3 years ago

By the way, the above code works perfectly. Thank you again for your assistance with it.