johnolafenwa / DeepStack

The World's Leading Cross Platform AI Engine for Edge Devices
Apache License 2.0
675 stars 105 forks source link

Jetson docker image consuming a lot of resources when idling #46

Open MFornander opened 3 years ago

MFornander commented 3 years ago

Awesome work!

Everything is running fine but I'm surprised the CPU is running this much when it's idling. I was looking at the code, guessing the main loop is not waiting for input but ran out of time finding where detection.py's objectdetection() was called and what delay param was used.

Running docker with: sudo docker run --runtime nvidia --restart unless-stopped -e VISION-DETECTION=True -p 80:5000 deepquestai/deepstack:jetpack-x3-beta

image

CONTAINER ID    NAME         CPU %   MEM USAGE / LIMIT     MEM %   NET I/O         BLOCK I/O       PIDS
5f58358a348c    hopeful_gnu  31.63%  1.475GiB / 3.863GiB   38.18%  51.5MB / 305kB  254MB / 45.1kB  15
MFornander commented 3 years ago

Ok, looking at the code I see SLEEP_TIME = 0.01 in shared.py which is unfortunate since all the other options allow environment overrides. All the docker images supply the ENV SLEEP_TIME 0.01 so it looks like ti was just a miss.

Short term solution: Change line 38 in shared.py from SLEEP_TIME = 0.01 to SLEEP_TIME = os.getenv("SLEEP_TIME", 0.01)

I don't do much Python so may need some string to number conversion.

Long term solution: Don't busy wait with a delay. Wake up the thread when there is a new image processing request.

MFornander commented 3 years ago

Ok lunch is over so back to work here. I looked into Redis and redis-py and it seems like there is a PubSub system for subscribing to database events. A kinder (to the CPU) way would be to subscribe to events related to the IMAGE_QUEUE key and putting the detection loop in the callback as described here:

https://redis.io/topics/notifications https://stackoverflow.com/questions/55112705/redis-python-psubscribe-to-event-with-callback-without-calling-listen

I may look at making this change next week in a fork but just thought I would think out loud here in case someone else is looking at this.

johnolafenwa commented 3 years ago

Thank you @MFornander This is a great suggestion. I am looking forward to the PR.

You might find this useful, https://aioredis.readthedocs.io/en/v1.3.0/examples.html

MFornander commented 3 years ago

I managed to fork, build the go server, build a new docker image but having issues getting a working image. YOu guys are probably busy but if you complete the Build from Source section, you may get more people helping out.

Where I am right now:

  1. Fork DeepStack to mattiasf/DeepStack
  2. Clone my fork to Jetson (building locally)
  3. Download arm64 go compiler from https://golang.org/dl/
  4. Compile go server: cd server && go compile && cd ..
  5. Build docker image: docker build -t mattiasf/deepstack:jetpack -f Dockerfile.gpu-jetpack .
  6. Run local image: sudo docker run --runtime nvidia --restart unless-stopped -e MODE=High -e VISION-DETECTION=True -e SLEEP_TIME=1.0 -p 80:5000 mattiasf/deepstack:jetpack

Server comes up but doesn't respond to REST calls. What's missing? I can't really help without getting the unmodifed fork up and running.

johnolafenwa commented 3 years ago

Hello, i will update the readme with build instructions today. In the meantime, when you run deepstack. You can view the logs to see what went wrong. Will add guide for that too.

Do the following

  1. Get the name of the container with sudo docker ps
  2. Run sudo docker exec -it container-name
  3. Once inside the container, run apt-get install nano
  4. cd to cd /logs/
  5. Open the errror logs with nano stderr.txt

Let me know what the content of the file is

MFornander commented 3 years ago

Thanks @johnolafenwa !

Had to make some changes to your steps and including them here in case someone else follows here. Skipped nano too.

  1. sudo docker ps
  2. sudo docker exec -it {container-name} bash
  3. cat /app/logs/stderr.txt
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/app/intelligencelayer/shared/detection.py", line 66, in objectdetection
    detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
  File "/app/intelligencelayer/shared/process.py", line 36, in __init__
    self.model = attempt_load(model_path, map_location=self.device)
  File "/app/intelligencelayer/shared/models/experimental.py", line 159, in attempt_load
    torch.load(w, map_location=map_location)["model"].float().fuse().eval()
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 585, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 755, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.
johnolafenwa commented 3 years ago

Hello @MFornander , Thanks for getting this to work and posting your steps. It appears when you cloned the repo, you didn't fetch the model files with git lfs, this would have resulted in the model files being invalid.

You need to install git lfs and run git lfs fetch from the repo root, This will fetch all the model files, then build deepstack after

Ryushin commented 3 years ago

I found out Blue Iris just had native support for Deepstack instead off using AITool to act as a middleware between BI and Deepstack. It also supported Deepstack on Windows and I saw there was a new build. I installed Deepstack on Windows and everything seemed to run great. I noticed though that Python was consuming 10-20% of the CPU (on a fairly power CPU). I read the Deepstack on Windows not as efficient so I thought that is where the Python resource was coming from.

So I went back to my Linux docker image and upgraded it to the latest. Low and behold, python is also running the system hard. PS shows: "python3 /app/intelligencelayer/shared/detection.py"

I tried deleting the entire image and starting from a fresh pull and it's still occurring. My docker start command:

docker run --detach --name=deepstack --restart=always -e MODE=High -e VISION-DETECTION=True -e VISION-FACE=True -v localstorage:/datastore -p 5000:5000 --name deepstack deepquestai/deepstack:latest

I think I might need to roll back to an older version as the resource hit not acceptable for a idle application.

Ryushin commented 3 years ago

I rolled back a zfs snapshot that contained the docker image from earlier this morning and this system is running well again. Not sure how to identify the deepstack version:

ocker images
REPOSITORY              TAG       IMAGE ID       CREATED        SIZE
deepquestai/deepstack   latest    8b917481f961   2 months ago   2.97GB
Ryushin commented 3 years ago

So I spoke too soon about the older version not having the idle resource issue. It seems as soon as Blue Iris connects to Deepstack the resource usage goes up. Old version included. Not sure what Blue Iris is doing in the background.

somebody-somewhere-over-the-rainbow commented 3 years ago

same issue here with the normal docker image. Deepstack container is idling around 10-15% on an Xeon E-2224 xen CPU. Python3 creating the load. Hope there will be a fix soon - load should not be that high when it not doing anything. @johnolafenwa: any idea what is creating this idle usage?

Ryushin commented 3 years ago

I worked with Ken at Blue Iris to figure out the problem that was causing the load and a memory leak. When using PTZ cameras on a patrol, having the camera Deepstack option "Detect/Ignore static objects" caused this problem. Disabling it and all was happy again.

somebody-somewhere-over-the-rainbow commented 3 years ago

I am not using blue iris. I just irregulary send pictures to the docker container (which only has VISION-FACE=True enabled) for face recognition.