hydrusvideodeduplicator / hydrus-video-deduplicator

Video Deduplicator for the Hydrus Network
https://hydrusvideodeduplicator.github.io/hydrus-video-deduplicator/
MIT License
41 stars 7 forks source link

ERROR: Failed building wheel for vpdq #5

Closed b1n4ryj4n closed 1 year ago

b1n4ryj4n commented 1 year ago

Hi there, i wanted to test your tool but unfortunately i get this error:

`Building wheel for vpdq (setup.py) ... error error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [4 lines of output] running bdist_wheel running build running build_ext error: [WinError 2] Das System kann die angegebene Datei nicht finden [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for vpdq Running setup.py clean for vpdq Successfully built hydrus-video-deduplicator Failed to build vpdq`

Im using Windows 10 x64 with Python 3.11.4 and ffmpeg version 6.0-full_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers built with gcc 12.2.0 (Rev10, Built by MSYS2 project)

Do you know a way to install vpdq?

appleappleapplenanner commented 1 year ago

Thank you for the detailed error message.

Since vpdq is partially written in C++, you will also need ~Visual Studio to compile and install vpdq on Windows.~

This is a PITA for most people so I will try to get some wheel binaries uploaded to PyPI so you don't have to deal with this.

EDIT: Actually, it's not so simple it appears. There's some stuff I have to edit in the vpdq repo I believe to get it to build on Windows. This is my top priority and I'm working on it.

appleappleapplenanner commented 1 year ago

Okay, I'm deep in the rabbit hole trying to get this to build on Windows. vpdq was clearly not meant to be built on Windows and they use some glibc functions. They also have some weird C/C++ mix with file pointers instead file streams even though they're using C++?

But, it's a small library so I can either modify vpdq C++ code to work on Windows, or port it to Python. I'll do some tinkering.

EDIT: nm this is a fucking disaster. pdq is also difficult to get building on Windows. TMK isn't though, so I guess I could use that for only Windows users...?

appleappleapplenanner commented 1 year ago

Unfortunately, it appears it would be out of scope of this project to allow pdq and then vpdq to build on Windows.

I'll look around for some other perceptual hashers, but man vpdq is so good. If it's relatively similar performance or just a little worse, I will switch to that hasher or switch to it just for Windows for better compatibility. I want people to use this project as easily as possible.

But, for now, WSL is very easy to install and I would be happy to eventually make a Docker container to make it simpler (even though WSL depends on WSL now...). If I can make a script to install WSL2 and run the program that way I will do that.

It runs perfectly under WSL which is how I'm developing it right now. I'll include some steps in the README on how to get that running.

b1n4ryj4n commented 1 year ago

Thank you for your quick reply. I have installed it with WSL now with the wiki instructions.

But i needed to update step 1:

sudo apt-get python3-dev ffmpeg libavcodec-dev git to be sudo apt-get python3-dev python3-pip ffmpeg libavcodec-dev git

and now im struggling with the connection to my hydrus instance. (I can reach it with curl https://myip:45869)

If i change --api-url to be "https://myip:45869" i get Failed to connect to Hydrus.

Max retries exceeded with url: /api_version (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))

But i added my mkcert root certificate already successful to WSL (Ubuntu) but somehow it gets ignored by python. So i needed to download an up to date ca-bundle and add my custom root ca to it. After that was done i added export REQUESTS_CA_BUNDLE="/home/hydrus/ca-bundle.pem" to the the .bashrc file.

Now it seems to work (116/672364 [03:39<143:42:02, 1.30it/s]).

-- But somehow ffmpeg is most of the time not be able to generate the video stream after i reach ~1024 files

 Failed to calculate a perceptual hash.
  0%|▍                                                                                                                                                  | 2113/672364 [26:23<18:18:39, 10.17it/s]vpdqPY: ffmpeg to generate video stream failed
 Failed to calculate a perceptual hash.
vpdqPY: ffmpeg to generate video stream failed
 Failed to calculate a perceptual hash.
  0%|▍                                                                                                                                                  | 2115/672364 [26:23<18:25:18, 10.11it/s]vpdqPY: ffmpeg to generate video stream failed
 Failed to calculate a perceptual hash.
vpdqPY: ffmpeg to generate video stream failed
 Failed to calculate a perceptual hash.
  0%|▍                                                                                                                                                  | 2117/672364 [26:23<19:17:10,  9.65it/s]vpdqPY: ffmpeg to generate video stream failed
 Failed to calculate a perceptual hash.
  0%|▍                                                                                                                                                  | 2118/672364 [26:24<21:21:43,  8.72it/s]

and sometimes i get

[av1 @ 0x55a464d717c0] Your platform doesn't suppport hardware accelerated AV1 decoding. [av1 @ 0x55a464d717c0] Failed to get pixel format. [av1 @ 0x55a464d717c0] Missing Sequence Header.

and

│ │                      │   ... +21                                                             │ │
│ │                      ]                                                                       │ │
│ │    query_match_cnt = 0                                                                       │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ZeroDivisionError: division by zero
appleappleapplenanner commented 1 year ago

Max retries exceeded with url: /api_version (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))

I see. I was using http for my testing so I didn't encounter this problem. The default IP should be https with fallback to http but it doesn't do that yet.

The hydrus-api library I'm using doesn't appear to have a way to disable SSL verification, so I might fork it and then give users the option to disable SSL verification. This shouldn't be a big issue since almost everyone's instance is local so MITM attack is not a concern to me.

Otherwise, the program should first try to connect to the https version of the URL and then fallback to the http version with a message, and then exit if that also fails.

I will also try to generate a cert and write down the process so I can include it in the wiki.

I also noticed that in WSL you need to get the host URL $(hostname) since localhost doesn't work, so I'll make sure to include that or set that as the default rather than localhost.


The AV1 error is either an OpenCV error or FFmpeg error, so it could be from a missing decoder in FFmpeg. I tried my own av1 file and I also got Missing Sequence Header so I'm looking into it. It could be a big enough pain to just disable AV1 hashing for now, but AV1 support is very important to me.

The 1024 file issue I can't solve unless I get more information so I'm going to create a larger library. If you run the program with --debug or --verbose it will give you more information.

Your video library is roughly 10,000x larger than the one I'm testing with so you're going to probably encounter some bugs I won't. I'll try to download at least 10,000 videos to test better.

Also, FYI: I just changed a ton of stuff:

I renamed the package from hydrus_video_deduplicator to hydrusvideodeduplicator, moved the database file from /hydrus-video-deduplicator/thedb to an SQlite DB in ~/.local/share/hydrusvideodeduplicator/, and finally uploaded the package to PyPi so you can just do pip install hydrusvideodeduplicator without cloning anything. The old one should probably be uninstalled with pip uninstall hydrus_video_deduplicator.

Just letting you know in case you're wondering why anything broke. I'm not planning on making any other huge changes with regards to the package structure from here on out other than some CLI parameters. I also might change the database structure if necessary and in that case you will lose your perceptual hash cache if you update, but I would rather not do that.

appleappleapplenanner commented 1 year ago

Fixed av1 unsupported in fe23b76 with a band-aid of falling back to transcoding to avc1.