How to scale ImageHub to work for 200+ cameras

Hi Uday @Udayaprasad,

My project architecture has many cameras sending images to a few hub computers that analyze those images. The imageZMQ package is just an image transport tool. It suits my own project perfectly, but there are several questions that might help you decide if imageZMQ will scale well for you.

Do you need the cameras to send every image? Or can you pair the camera with a local edge computer (e.g. Raspberry Pi) to selectively send only some subset of relevant images?
Are you OK with image compression (e.g. jpg) before sending?
Do you need a series of still images in a stream or do you need an actual video stream (they are not the same thing)?
Do all your cameras need to send images at the same time?
Do you want to use the REQ/REP or the PUB/SUB messaging pattern of ZMQ?
How big do your images need to be?

Here are my thoughts:

Sending a subset of images works well in my project. I have 24 Raspberry Pi computers with attached cameras capturing images at 16-32 FPS. That means 1 to 3 million frames a day per camera. But, I run a program in each Raspberry Pi that sends only a small fraction of those images. The image selection code only sends images that meet certain criteria (person detected, vehicle detected, coyote detected etc.). This means the network never gets bogged down and the hubs have no trouble keeping up. But that would not be possible if I was sending all of the images from every camera.
OpenCV images are large; compressing them to jpgs lowers the network load by 70 to 90%. But the jpgs lose some image quality (not much and it doesn’t affect my own use case). And the compression itself takes significant time on a Raspberry Pi. It is a tradeoff of higher network load for raw OpenCV images versus higher CPU load for jpg compressed images. I have found that sending jpgs works best for me, but it is worth running timing tests because image size, image heterogeneity, etc. can make a big difference.
Images versus video: My own application is about analyzing individual frames at the hub computers. So sending individual frames (which is what imageZMQ does) works well for me. But you should consider whether video would work for you. A video stream does not send every image; it sends a full image (a “reference” image), then sends only the parts of the image that change for a period of time until it sends the next reference image. There are many video codecs and video compression tools, but all of them use some sort of frame to frame encoding that specifically AVOIDS sending every frame. Netflix could never stream movies if it actually sent all the frames. Sending video is almost always faster than sending individual image frames. Image compression to jpg images is not the same as video compression because when compressing images to jpgs, every image is compressed individually with no information or “deltas” from previous or subsequent images. The imageZMQ package sends individual images (with optional jpg compression), but it does not send compressed video. Compressed video may be better choice for 200 cameras.
Do all your cameras need to send images at the same time? In my own setup, I have the network topology set up so that cameras that are less likely to send images at the same time share a network. This means my driveway cam watching the mailbox (which has relevant images sent during the daytime) can share a network hub with a infrared coyote cam behind the barn really well, since these 2 cameras will never send images at the same time. But I design my network so that the multiple infrared coyote cams are on different sub networks, since they are more likely to send at the same time. This kind of time sharing analysis allowed me to optimize my different subnets and optimize bandwidth. I found it helpful, but that may not be possible for you.
REQ/REP versus PUB/SUB messaging pattern: There are 2 different messaging patterns available in imageZMQ: REQ/REP and PUB/SUB. There are many advantages and disadvantages to each of them. Depending on how your cameras are networked and how powerful your hub computers are, PUB/ SUB may be faster, but PUB/SUB can also drop some frames when the network is busy. I use REQ/REP in my project. You can read more about imageZMQ messaging patterns in these docs.
How big do your images need to be? While large images look good on the screen or in a photo album, they are often overkill for simple computer vision tasks. Many computer vision machine learning techniques need to reduce the image size to something small (256 x 256 pixels is a common size) to do image classification or object detection. The smaller your image size, the faster you computer vision pipeline will be and the less you will overload your network. I have found that 640 x 480 pixel images are large enough for my projects; they take a LOT less bandwidth that 1080p or larger images. And for some of my images, 320 x 240 pixels is enough to get a particular task done (like reading a water meter).

I have an overview of my Yin Yang Ranch project that uses imageZMQ in this GitHub repo: I also gave a 30 minute PyCon 2020 presentation about it. (Here is the slide deck for that if you want the outline without the talk itself).

Here are some other lessons I have learned as I scaled to larger numbers of cameras using imageZMQ. This comes from a post I made on a message board about ways to set up a network of cameras using imageZMQ. Some of it may be helpful to you; some of it is a restatement of the above ideas:

Using PiCameras: Raspberry Pi’s with PiCameras work better for me than webcams. With PiCameras using the PiCamera Python module, you can adjust many camera settings like “night mode” versus “sports mode”, change ISO setting, Analog or Digital gain, etc., etc. Also, the PiCamera uses the RPi Camera GPU which interacts with the image in “Video Memory” (which is just a portion of the RPi’s memory used for video processing by the PiCamera. The PiCamera is effectively a cell phone camera, and its images are captured into Video Ram). See the very detailed PiCamera Python module docs for more info on how the camera sensor and Video Ram interact.
Using different image sizes for different tasks: I use code on the RPi to size images to send the “smallest size needed for a given application”; I have wildlife cams that send higher resolution photos because wildlife details are more important for my application. I use lower resolution on the driveway cam, which is enough to get the desired classification level. Classifying (truck vs van vs car) is easier than classifying (coyote vs bobcat vs raccoon in poor infrared lighting). Image size makes a big difference in network latency. Perhaps some of your cameras can be set to lower resolution that others. That will lower your network load.
Using PiCamera module methods: The PiCamera Python module has a lot of powerful methods for resizing, flipping, transforming images, etc. I use them, because in many cases these modules use the PiCamera GPU of the RPi, while OpenCV does not. Also, using RPi’s and PiCameras gives me lots of ways to use Python code on the RPi that affect network load even before putting anything on the network at all. (see some of the points below).
Hardwired Ethernet: I use hardwired ethernet rather than Wifi whenever possible. I have wired ethernet hubs in multiple locations around my farm, including my barn hub, and my mid-farm “camera and ethernet hub” in a big rubbermaid plastic box, etc. About 16 of my 24 RPis use hardwired ethernet. All of the imagehubs are laptops with hardwired ethernet. This definitely made a big difference. WiFI can get loaded up quickly when images are being passed around. Remember, if you use WiFi on the camera and you also use WiFi on the hub computer, you are putting a double load on the WiFi network for each image. The image is passed once from the RPi camera to the WiFi router and then it is passed from the WiFi router to the hub computer. If you use wired internet hubs, the images are still passed to and from camera and hub computers. BUT the ethernet wires are separated and don’t compete for the same bandwidth. With WiFi, both those image transmissions are in the same WiFi frequency space. So at the very minimum, make sure your hub computers are using wired ethernet to any WiFi hub that might be used for the cameras.
Careful image selection before sending: Most of the code on my RPis is doing careful image selection BEFORE sending. I send the VERY FEWEST number of images that gets the job done. For example, the water meter PiCam use imagenode logic on the RPi to watch for movement of the water meter needle. It then sends 4 images when the meter needle starts moving and 4 images when the meter needle stops moving. So, even though the Raspberry Pi is capturing a million frames a day, it is only sending about 200 images a day over the network. While the meter needle is actually moving, no frames are sent. It is enough to know that it started moving at a certain time and ended moving at a certain time. It only sends image frames at both those times. That’s enough to both know if the water is running and to read the “digits” part of the water meter often enough to know what parts of my farm use more or less water. The water meter is just one example. My detectors for light, motion, object blobs, etc. that run on the RPi’s are optimized to send the fewest number of images possible to fulfill a specific need. The frame sending needs for WaterMeterCam versus DrivewayCam versus BarnWildlifeCam are very different. Only the specific images needed for a specific application requirement are sent over the network. This one thing makes the biggest difference in network load for my project.
Sending jpgs versus OpenCV images: This is another BIG one. My network load dropped 75 to 90% when I converted my RPi’s from sending OpenCV images to sending jpgs. It is why I added the jpg sending methods to imageZMQ. It puts more computation load on the RPi, but it has been a good tradeoff for me. Conversion to jpg has not been a problem for my overall computer vision pipeline; the conversion to jpg behaves a lot like Gaussian Blurring, which I do in my subsequent computer vision pipeline anyway.
Running FORMAL experiments and doing hypothesis testing: I do formal tests of different alternatives that might make things faster. This stems from my 40 year career as an applied statistician. If you want to know if something actually makes a difference in how fast a system is, set up a formal A/B test matrix. Lay out a complete hypothesis testing plan. Power your experiment appropriately for all arms of the study (complicated sub-subject, but it is basically having enough observations in each arm of the experiment to get a high probability you are making the right conclusion). Build a formal experimental design. Take the observations using software, not a pencil and paper. Put all the observations in a dataset and don’t look at them too soon. Make sure your experiments have all the variation you are expecting in the real world. Then perform your hypothesis test at the end of a sufficiently long test period. Often a simple t-test is enough. The experimental design & data gathering is hard, the actual statistics don’t need to be.
Waiting until enough data is gathered: Don’t optimize too soon or optimize without gathering real world data. Everyone says this because it is true. In many parts of my project, what I thought might make a difference did not make a meaningful difference. And things which I didn’t expect to make a difference actually did.
Spending lots of time tuning and tweaking: That means trying LOTS of different ways to do something and carefully testing what makes a measurable difference. That means putting in LOTS of settings parameters and trying lots of different combinations of them. Then, automate the process. I have programs that adjust my RPi imagenode settings across a range. It’s a lot like Auto-ML, but for the things in your computer vision pipeline that affect overall latency and network throughput. I’ve hacked together an “Auto-ML” program that changes settings on my RPi imagenodes and builds a matrix of observations. It’s messy, but it works better than manually changing my YAML settings files on 24 RPi’s using ssh and vi to edit the YAML files. I treat my YAML settings files as data and vary them as part of my experimental protocol. Changing my settings can have a significant impact on my network loading and pipeline throughput.
Using settings files for your imagenode cameras / detectors: My RPi’s are “imagenodes” that run the Python programs that implement all the choices mentioned above. I chose YAML files to hold the settings for my imagenode program. There are many alternatives to YAML files, but any alternative you choose will need the ability to be automated. YAML files can easily be read, modified and written back by Python programs, because they are basically Python dictionaries. I like them because they are more “human readable” than json files or config files. It is best to have a settings file format that can easily hold lots of options. That way you can automate your experiments as mentioned above. For more information on YAML files and how I am using them, see my imagenode GitHub repository: https://github.com/jeffbass/imagenode
Use a fast image sending protocol that works for your specific setup: I designed imageZMQ and tested it for over a year against MQTT, RabbitMQ and other alternatives. I found that ZMQ implemented via imageZMQ gave me the fastest image transfer protocol for my particular application design. My design is in this repository: https://github.com/jeffbass/yin-yang-ranch. Your design will be different, so adjust your image transfer protocol accordingly. Once I started using imageZMQ as my image transfer protocol, it affected other parts of the design. I’ve had to iterate things quite a bit. Be flexible and do testing. Maybe MQTT or some other protocol is more suitable for your application.
Deciding how many Hub computers are needed: I started with 3 RPis and one Mac laptop. I did a lot of testing. I am up to 24 RPis and 3 laptops now. It sounds like you may need to process frames from all your cameras at the same time. I don’t have to do that. I have adjusted my network topology and number of RPi’s per hub computer quite a bit. What I found was that most of the time, the RPis weren’t sending data at the same time. For example, the water meter did not send frames at night, but the barn wildlife cam watching for coyotes, bobcats, etc. sent all its frames at night. Putting those 2 cams on the same hub computer works well. They are active (usually) at different times. I found that no more than 8 RPi cams per hub computer works well for my system. More than 8 doesn’t. Your system will be different, but you might seriously consider using multiple central computers. You may want fewer than your full 14 computers per central computer. Then you can have another computer act as the hub the central computer. I use 3 hubs (with up to 8 RPis each). I have a central “librarian” program to pull together the data from the 3 hub computers. My “hub and spoke” design using RPi imagenodes --> imagehubs --> librarian design is shown in my my https://github.com/jeffbass/yin-yang-ranch repository.
Don’t move data across the network any more than you really need to: My “librarian” computer looks at and processes images only upon user request. In response to a “how much water was used from 3 to 4pm today” query, the librarian looks at images for the water meter on hub 1 and filters only images from 3pm to 4pm today. But it never moves images unless a data request specifically asks for them. I don’t ever move ALL the images from hub 1 to the librarian. I move only the specific images requested. I keep a cache of images on the librarian. So I never fetch an image from hub 1 if it has already been fetched for a previous request. These design features significantly reduce network load and latency.
Lighting, Lighting, Lighting: Getting the lighting right and handling shadows correctly is a big part of tuning my system. Something as simple as increasing the number LED diodes above my water meter and spreading them out more on larger circuit boards made a huge difference. Adding a second infrared floodlight in a wildlife area made a big difference because it helped eliminate shadows. Better lighting means you can use smaller image sizes to get the same application results (object detection accuracy, etc). Smaller image sizes being transmitted means lower network load and latency.
Handling shadows: In my wildlife cameras, shadows caused by tree branches moving in the wind can generate a lot of false positives in my motion detection software. Handling the detection of moving shadows (versus a moving critter) took some trial and error; I am still tweaking that one. But how you handle shadows in your computer vision pipeline may affect how many imagenodes and imagehubs you need. I don’t know what the shadows will be in your system, but running tests with different lighting and paying attention to shadows and their effects on your pipeline is likely to be helpful. If you can filter out “shadow false positives” and not send them over the network, you can lower your network load.

Feel free to ask follow on questions in this thread and I’ll do my best to answer them. Also, please share your learnings as you build out your own system. If you find something better than imageZMQ for scaling to 200 cameras, please comment in this thread and let me (and the other imageZMQ users) know. All of us are always iterating to newer and faster tools.

Jeff

jeffbass / imagezmq

How to scale ImageHub to work for 200+ cameras #54