machinebox / issues

Machine Box issues, bugs and feature requests
2 stars 0 forks source link

Facebox maxing out CPU under low/moderate load #45

Closed DConcord closed 5 years ago

DConcord commented 5 years ago

This week I deployed two new servers:

In both cases, nothing else but docker running the latest Facebox is running on the machines.

In both cases, when a burst of photos (10 - 20) ranging in resolution from 2688x1520 (typically only 3 or so of these in the burst) to 864x1296 (typically the majority) is sent to Facebox spaced out every .5 seconds (rate limited), Facebox will process a few then begin returning: Error: socket hang up : http://x.x.x.x:8080/facebox/check or Error: connect ECONNREFUSED x.x.x.x:8080 : http://x.x.x.x:8080/facebox/check

And when checking docker stats, the full available CPU is maxed out:

i7 CPU:

CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
4ba3c49c364d        facebox             700.95%             2.571GiB / 6.805GiB   37.79%              35.1MB / 172kB      0B / 0B             29

E3 CPU:

CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
a7b618b36533        facebox             400.23%             505.9MiB / 15.62GiB   3.16%               9.4MB / 2.07MB      0B / 0B             16

The CPU typically remains hung this way until the image is restarted. Setting MB_WORKERS makes no difference, and setting --cpus= to a lower number causes cpu to max out in the same way at that lower core count. Occasionally it will resolve on its own and Facebox will return a facecount of 700-3,000+ unknown faces and an array with that many results (I attached a capture of one of these, but my logger couldn't capture the entire thing) Facescount 3475.txt

I've been running Facebox over the last month on a server with dual Xeon 5530 CPUs using the Facebox_noavx image. While the processing speed is much slower, the stability has been solid. I was very excited to see the touted 5-10x performance boost with the AVX/AVX2 processors, but then bummed to come across this issue that renders the boxes unusable

Any assistance is greatly appreciated, thanks!

dahernan commented 5 years ago

I'm surprise about the stability, we never had problems with it, and the only difference between AVX and no AVX is the compilation flags.

My advise is try to review your installation and config and also you don't need to send high resolution photos, you will have same result with 1000x800 or even lower, and facebox would expend much less time.

Let us know if you make any progress.

DConcord commented 5 years ago

The large photos are necessary because the source camera covers a larger area, so many times the resulting faces is lower resolution. But I only run 3 of these at a time spaced apart by 1 second. However, I've done a few things to simplify even further:

On top of being a green install of ubuntu 18.04 on hard metal running nothing but the latest docker CE (using instructions from their website) and the latest docker pull machinebox/facebox, I tried running with no faces taught (thinking perhaps one of the images was corrupt or causing odd behavior) but got the same results. I even tried eliminating the larger photos and only running bursts of 1296x864 or smaller and got the same (only takes about 15 photos over a few seconds to replicate the issue)

One thought: there are several delivery options available, and currently I'm using Base64 encoded string via JSON. Is there a preferred method of those available? I might give Base64 encoded string via POST a try and/or see if I can figure out Direct HTTP Post (Binary)

dahernan commented 5 years ago

The optimal method to send is by direct http post, so you save the work of encoding and decoding. But I don't think it it will change much. The main thing is working with big resolution is very heavy. I know is not ideal but what about cropping the photo in squares?

dahernan commented 5 years ago

closing, feel free to reopen

DConcord commented 5 years ago

hey, checking in. The issue is still happening regularly as described for sure. I have some more data however. The size of the picture and upload method don't seem to matter. And I've been able to test using direct URL pushes of the images (I've tried nearly all of the available methods, all the same result). But interestingly, I was able to get ahold of a brand new Macbook pro with an i7-8559U CPU (Coffee Lake) and the system ran flawlessly! I was able to overload the CPU eventually but at a more than reasonable limit and perhaps 10x or more load than my other CPUs and never once had the problem of maxing out the CPU and eventually spitting out a huge, extraneous result (As described above).

My theory is that the images designed to work with AVX are compiled to be optimized for Intel Skylake and newer CPUs since that is primarily what is being run on AWS, etc. My understanding is that Coffee Lake is a more incremental derivative of Skylake and more of a sub-category (could be way off on that). But while my two older CPUs are new enough to have AVX and even AVX2, they must be just different enough from Skylake to consistently have these issues. They are Sandy Bridge and Haswell, btw.

dahernan commented 5 years ago

Thanks for all that info and the time analysing, is very interesting. Indeed the main compilation target is for Intel, I'll try to review the compilation flags

DConcord commented 5 years ago

Thanks! And I have some more analysis.

Very unfortunately, It looks like my Intel Skylake+ theory is wrong. I got ahold of a brand new box with an Intel I7-8700 (6 core plus hyper threading, 16 GB RAM, NVMe SSD) and to my surprise, it is having exactly the issues described in the original post: quickly maxes out CPU, becomes unresponsive, and spits out erroneous results with dozens or hundreds or thousands of matched faces when there are none.

I went back to the MacBook Pro that was working better and realized that the Docker image it is running is 6 months old while the new image on the new PC is 6 weeks old. That would put the other image from somewhere back in Octoberish. I saved, transferred, and loaded that 6mo old image to the new box and it runs much much better! Still not as stable as I would expect though (only seems to use 3 of the 12 available CPU threads at max load and eventually will crash/restart after about 23 seconds of receiving a 1080p picture every .4 seconds - not a massive load) (EDIT: I found an issue on my camera server where there was a max sessions causing issues)

As a test/workaround, I started 4 separate Facebox instances (the 6 month old version) on the new server, limited to 3 CPU each (12 total). Using a kemp load balancer, I'm able to go full speed and can't crash the thing when I send a 1080p picture every .2 seconds!! Much much better performance, but there's definitely something wrong with newer versions of the docker image, and even the old one isn't perfect (unless 3-4 CPU is closer to the intended functionality)