notAI-tech / NudeNet

Lightweight nudity detection
https://nudenet.notai.tech/
GNU Affero General Public License v3.0
1.76k stars 342 forks source link

Improve video detection efficiency for web servers #72

Open Andrew-Chen-Wang opened 3 years ago

Andrew-Chen-Wang commented 3 years ago

Video detection for large videos takes quite a bit of time, and it's not really feasible in a server setting (even just testing on my laptop, the memory consumption was a bit of a concern). So recommendations that should be pretty easy to implement:

  1. Instead of preprocessing all the frames beforehand, do it in batches and find the score. If a given threshold is not met, then preprocess the next batch of frames. Return immediately upon getting a score that matches or exceeds a threshold.
  2. Although this may be out of scope for this repository, I would want to just have a function that returns bool rather than all the frames if a certain score threshold is reached or exceeded, and do so immediately. Found in: https://github.com/notAI-tech/NudeNet/blob/7b69fe1ea593731b9d544ad2f9dbcd8fcd3de54c/nudenet/detector.py#L115-L129
  3. Perhaps preprocessing the images should be stored in tmp files rather than storing in memory? Prob not worth it if we followed point 1

Just some thoughts to help people make this easier. Yes, you can offload this process to a different server, but even still, the process could be sped up. Thanks!

Additionally, the default onxx files in my opinion should be packaged instead of writing to people's home directories. But that's external, and I don't mind too much :P You can use Git LFS with GitHub and store the models like how tf does it.

Andrew-Chen-Wang commented 3 years ago

Or we can simply rewrite:

https://github.com/notAI-tech/NudeNet/blob/7b69fe1ea593731b9d544ad2f9dbcd8fcd3de54c/nudenet/video_utils.py#L65-L76

Specifically line 76 where we're actually "reading the video", we can instead rewrite detect_video to just process frame by frame. We could store the frames if we wanted to have less time consumption or we could not store the frames to conserve memory.

Andrew-Chen-Wang commented 3 years ago

Was the classification model updated to use that 160,000 autolabeled images btw?

Andrew-Chen-Wang commented 3 years ago

Memory spike. It happens at get_interest_frames_from_video, and I'll assume it also happens at load_images:

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     9     80.0 MiB     80.0 MiB           1   @profile
    10                                         def main():
    11    173.3 MiB     93.3 MiB           1       classifier = NudeClassifier()
    12    173.3 MiB      0.0 MiB           1       file = str(Path(__file__).parent.absolute() / "pic.jpeg")
    13    173.3 MiB      0.0 MiB           1       video_file = str(Path(__file__).parent.absolute() / "vid.mp4")
    14                                         
    15    173.3 MiB      0.0 MiB           1       print(datetime.now())
    16    214.5 MiB     41.2 MiB           1       print(classifier.classify(file))
    17                                         
    18    214.5 MiB      0.0 MiB           1       print("Video")
    19    214.5 MiB      0.0 MiB           1       print(datetime.now())
    20    189.4 MiB    -25.1 MiB           1       print(classifier.classify_video(video_file))
Traceback ``` [ 94 96 137]]] OpenCV(4.5.1) /tmp/pip-req-build-_a0ur5ao/opencv/modules/core/src/alloc.cpp:73: error: (-4:Insufficient memory) Failed to allocate 2764800 bytes in function 'OutOfMemoryError' Traceback (most recent call last): File "/home/ec2-user/blah/nudenet/image_utils.py", line 133, in load_images image = load_img(img_path, target_size=image_size) File "/home/ec2-user/blah/nudenet/image_utils.py", line 56, in load_img path = cv2.cvtColor(path, cv2.COLOR_BGR2RGB) cv2.error: OpenCV(4.5.1) /tmp/pip-req-build-_a0ur5ao/opencv/modules/core/src/alloc.cpp:73: error: (-4:Insufficient memory) Failed to allocate 2764800 bytes in function 'OutOfMemoryError' ERROR:root:Error reading [[[ 70 81 75] [ 70 81 75] [ 70 81 75] ... [ 25 27 34] [ 25 27 34] [ 25 27 34]] [[ 70 81 75] [ 70 81 75] [ 70 81 75] ... [ 25 27 34] [ 25 27 34] [ 25 27 34]] [[ 70 81 75] [ 70 81 75] [ 70 81 75] ... [ 25 27 34] [ 25 27 34] [ 25 27 34]] ... [[ 67 72 72] [ 69 74 74] [ 69 74 74] ... [ 95 98 130] [ 96 99 131] [ 96 99 131]] [[ 67 72 72] [ 68 73 73] [ 69 74 74] ... [ 96 98 139] [ 97 99 140] [ 97 99 140]] [[ 66 71 71] [ 68 73 73] [ 69 74 74] ... [ 93 95 136] [ 94 96 137] [ 94 96 137]]] OpenCV(4.5.1) /tmp/pip-req-build-_a0ur5ao/opencv/modules/core/src/alloc.cpp:73: error: (-4:Insufficient memory) Failed to allocate 2764800 bytes in function 'OutOfMemoryError' Traceback (most recent call last): File "/home/ec2-user/blah/nudenet/image_utils.py", line 133, in load_images image = load_img(img_path, target_size=image_size) File "/home/ec2-user/blah/nudenet/image_utils.py", line 56, in load_img path = cv2.cvtColor(path, cv2.COLOR_BGR2RGB) cv2.error: OpenCV(4.5.1) /tmp/pip-req-build-_a0ur5ao/opencv/modules/core/src/alloc.cpp:73: error: (-4:Insufficient memory) Failed to allocate 2764800 bytes in function 'OutOfMemoryError' ERROR:root:Error reading [[[ 72 80 77] [ 72 80 77] [ 72 80 77] ... [ 24 24 31] [ 24 24 31] [ 24 24 31]] [[ 72 80 77] [ 72 80 77] [ 72 80 77] ... [ 24 24 31] [ 24 24 31] [ 24 24 31]] [[ 72 80 77] [ 72 80 77] [ 73 81 78] ... [ 24 24 31] [ 24 24 31] [ 24 24 31]] ... [[ 70 69 63] [ 70 69 63] [ 70 69 63] ... [ 88 93 133] [ 88 93 133] [ 88 93 133]] [[ 71 70 64] [ 71 70 64] [ 69 67 63] ... [ 89 96 141] [ 89 96 141] [ 89 96 141]] [[ 73 72 66] [ 70 69 63] [ 70 68 64] ... [ 87 94 139] [ 87 94 139] [ 87 94 139]]] OpenCV(4.5.1) /tmp/pip-req-build-_a0ur5ao/opencv/modules/core/src/alloc.cpp:73: error: (-4:Insufficient memory) Failed to allocate 2764800 bytes in function 'OutOfMemoryError' Traceback (most recent call last): File "/home/ec2-user/blah/nudenet/image_utils.py", line 133, in load_images image = load_img(img_path, target_size=image_size) File "/home/ec2-user/blah/nudenet/image_utils.py", line 56, in load_img path = cv2.cvtColor(path, cv2.COLOR_BGR2RGB) cv2.error: OpenCV(4.5.1) /tmp/pip-req-build-_a0ur5ao/opencv/modules/core/src/alloc.cpp:73: error: (-4:Insufficient memory) Failed to allocate 2764800 bytes in function 'OutOfMemoryError' ```
bedapudi6788 commented 3 years ago

@Andrew-Chen-Wang I agree that the memory performance of this can be vastly improved. Making get_interest_frames_from_videoa and detect_video iterables is probably the easiest way to do this. I will implement it when I get some time.

Returning an iterator will also directly add the bool functionality you are looking for. Feel free to submit a PR for these if you have the time.

Additionally, the default onxx files in my opinion should be packaged instead of writing to people's home directories. But that's external, and I don't mind too much :P You can use Git LFS with GitHub and store the models like how tf does it.

git lfs is not free (as far as I am aware). Anyway, hosting large static files separately and downloading the required ones on initialisation is the simplest and least dependency free option in my opinion.

Andrew-Chen-Wang commented 3 years ago

Feel free to submit a PR for these if you have the time.

For sure. Definitely something on my TODO list; hopefully I can find the time to work on it. But if you beat me to a PR or new commit (most likely not since school consumes so much time), I'll definitely take a look and contribute.

Thanks for this repo!

git lfs is not free (as far as I am aware)

My bad, shoulda researched better. Sorry I'm just a little paranoid that a repository could be deleted all of a sudden, so I was recommending to put the models in git for people to fork (since currently if you fork, the release and tags do not transfer with it which means the models do not transfer with it).