matiasdelellis / facerecognition

Nextcloud app that implement a basic facial recognition system.
GNU Affero General Public License v3.0
510 stars 46 forks source link

Multiple external model instances #743

Open 36grad opened 5 months ago

36grad commented 5 months ago

Summary: with this change multiple instances of the external model can be used to analyze multiple images in parallel.

Motivation: Since the recent change that analysis now uses more than one core the external model is kind of pointless (because it's only analyzing one image in one thread / using one core). That is especially sad if you have (for whatever reasons) a machine with a lot of cores and RAM that could easily analyze more than one image at one.

Enter "Multiple external model instances": With this change you can use multiple external model instances in parallel. For that, the new ImageProcessingWithMultipleExternalModelInstancesTask has been added which brings together the code from the original ImageProcessingTask and the ExternalModel and whips that up with curl_multi_exec to asynchronically send image analysis requests to the external model instances.

Configuration: All relevant settings on the facerecognition app side can be set using the occ face:setup command. Usage is displayed with occ face:setup -h.

The by far easiest solution is to use docker compose for the external model. There are only two changes to the docker-compose.yml required:

  1. you need to map a port range to the containers internal port rather than a single port mapping. The number of ports must be at least the number of instances: The following configuration line will use port 8083 through 8090.
     ports:
       - "8083-8090:5000"
  2. you specify that replicas shall be deployed: The following configuration line will result in a total of 8 instances of the external model.
     deploy:
       mode: replicated
       replicas: 8

    Each replica will listen on subsequent ports, starting with the first port in above configured port range.

Changes to facerecognition: I tried to make as little chanegs to the original code as possible. Three things were unavoidable:

  1. Add the related system settings to SettingsService.
  2. Change the FaceRecognitionBackgroundTask to add a cleanup method in order to be able to react to timeout.
  3. Introduce a new class for the Task and schedule it in the BackgroundService.

Besides that, I modified the SetupCommand to have access to more settings and to show the current settings a bit nicer :-) Also I have added Nextcloud's internal logger to the FaceRecognitionContext so that we can send messages to the Nextcloud log file.

Final words: I have tried this on my server and on a machine with 16GB memory I can run 8 instances in parallel (with a temp image size of 1600x1200).

matiasdelellis commented 5 months ago

Hi @36grad Thank you very much for all this effort, but it is very unlikely that I will accept it as such..

There are changes that really interest me, for example the changes in face:setup, but the rest of the implementation, although what you did is really great, adds a lot of unnecessary complexity (Why multiple ports? I never saw that in an api, but I understand what you wanted to do).

Since the https://github.com/matiasdelellis/facerecognition/discussions/716#discussioncomment-8109403 the external model is kind of pointless (because it's only analyzing one image in one thread / using one core).

This is not true, although unfortunately I am not having time to implement it. 😞

See: https://github.com/matiasdelellis/facerecognition/discussions/716#discussioncomment-8145358

We have to make just this change. You dare? 😃

36grad commented 5 months ago

OK to be honest I did not fully understand the last comment you have referred to, so I need to look into this a bit deeper...

But from a first quick glance it sounds like it looks like it will also spawn multiple instances of the external model (but additionally act as a loadbalancer). Then there's still the question how you feed the images parallely to the external model?

The have included the option to have the instance not only behind a load balance but accessible individually via conscutive ports because

  1. you can have this with almost zero configuration effort straight out of docker (and docker compose too).
  2. you don't need to configure a loadbalancer (which if I understood this gunicorn right is in this case not as complex as I though).
  3. if you have a laodbalancer it will work fine as well :-)

Or did I misunderstand can one single background_job analyze multiple images in parallel when combining with this gunicorn thing for the external model you mentioned?

Besides that: You can also of course only cherry pick the changes to the setup you like, no problem. I have one question here, why did you make the external model settings system settings and not application settings? Was this just a compromise to have a fast implementation because architecture wise it looks a bit inconsistent -> something we could improve then too.

matiasdelellis commented 5 months ago

Hi, Speaking of the external model, using gunicorn solves absolutely everything. Ideally we can think of a load balancer etc., but this should be done outside of our service.

About the local background_job task that makes the calls to the external model unfortunately it is not parallel. The use of file-level locks allows us to run multiple instances of the same process, but without conflicts in processing.

I just realized that I never posted this!. 🤦🏻‍♂️

#!/bin/bash
echo "Synchronizing user files..."
php occ face:background_job -u user --sync-mode
echo "Done"

echo "Analyzing user files..."
for i in {1..4}; do
    php occ face:background_job -u user --analyze-mode &
    pids[${i}]=$!
done

for pid in ${pids[*]}; do
    wait $pid
done
echo "Done"

echo "Calculating user face clusters..."
php occ face:background_job -u user --cluster-mode
echo "Done"

This would be the way to use all the new options..

p.s: Although I would like to make it 100% php within the backgroudn_job command, there is no standard api for multiple processes in php. Ideally we should use pthreads (officially deprecated module in php) to make the calls. and therefore in the meantime this is a small hack to obtain the expected results-

36grad commented 5 months ago

As you said, pthreads is deprecated so I had tried to use PHP parallel which is supposed to replace it I thinks. While I was able to build a docker image with PHP ZTS and install th eparallel module, the script crashed whenever I tried to start a simple thread - I must have still been doing something wrong.

That is why I moved on to use asynchronous cURL requests in this pull request. This way, in a single PHP thread we can use multiple external model instances in parallel.

In the end I guess your current implementation will achieve the same and it does not really matter if you schedule a bash script or the PHP task in cron. --> This script should then ideally be included in the app. The only thing that should be added to the bash is a timeout parameter for the script to be passed to the analyze background jobs and before the cluster background job it should be checked if time has run out and if not, a properly recalculated timeout needs to be passed to the cluster background job.