hydrusvideodeduplicator / hydrus-video-deduplicator

Video Deduplicator for the Hydrus Network
https://hydrusvideodeduplicator.github.io/hydrus-video-deduplicator/
MIT License
41 stars 7 forks source link

Suggestion to properly handle corrupted/broken video files? #40

Closed Drakonas closed 6 months ago

Drakonas commented 10 months ago

A traceback fills the process output if a corrupted video file is found. Perhaps a way to properly recognize this and not spit out an entire traceback would be a nice but not super important request?

Traceback (most recent call last):
  File "C:\Users\MaiDoreiku\AppData\Local\Programs\Python\Python311\Lib\site-packages\hydrusvideodeduplicator\dedup.py", line 152, in fetch_and_hash_file
    perceptual_hash = self.calculate_perceptual_hash(video_response.content)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MaiDoreiku\AppData\Local\Programs\Python\Python311\Lib\site-packages\hydrusvideodeduplicator\dedup.py", line 125, in calculate_perceptual_hash
    perceptual_hash = Vpdq.vpdq_to_json(Vpdq.computeHash(video))
                                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MaiDoreiku\AppData\Local\Programs\Python\Python311\Lib\site-packages\hydrusvideodeduplicator\vpdqpy\vpdqpy.py", line 166, in computeHash
    for second, frame in enumerate(Vpdq.frame_extract_pyav(video)):
  File "C:\Users\MaiDoreiku\AppData\Local\Programs\Python\Python311\Lib\site-packages\hydrusvideodeduplicator\vpdqpy\vpdqpy.py", line 150, in frame_extract_pyav
    for index, frame in enumerate(container.decode(video)):
  File "av\\container\\input.pyx", line 203, in decode
  File "av\\packet.pyx", line 83, in av.packet.Packet.decode
  File "av\\stream.pyx", line 177, in av.stream.Stream.decode
  File "av\\codec\\context.pyx", line 507, in av.codec.context.CodecContext.decode
  File "av\\codec\\context.pyx", line 416, in av.codec.context.CodecContext._send_packet_and_recv
  File "av\\codec\\context.pyx", line 440, in av.codec.context.CodecContext._recv_frame
  File "av\\error.pyx", line 307, in av.error.err_check
av.error.InvalidDataError: [Errno 1094995529] Invalid data found when processing input; last error log: [h264] Error splitting the input into NAL units.
Errored file hash: 443ed2cf638e46982a223f63630d6b4aa581418b31ea336a44012095b963d6b8

Here is the file for reference, so you can use it in testing: https://anontransfer.com/download/TRERaKcH3g/443ed2cf638e46982a223f63630d6b4aa581418b31ea336a44012095b963d6b8.mp4

I do not recommend playing the video file. It is SFW though. Link originated from here: https://twitter.com/kyattsu/status/1589500456682020864

If you want me to provide the file a different way I can. Hard to find free non-annoying file upload sites. :/ It's only 110KB in size.