gurushida / mnemophonix

A simple audio fingerprinting system
MIT License
29 stars 5 forks source link

MKV audio extract fail with Segfault #4

Open VigibotDev opened 7 months ago

VigibotDev commented 7 months ago

Hello, I need to deduplicate video file with many codec like AC3 5.1 / EAC3 5.1 and more than one audio track. I think I need to write the ffmpeg part myself to downscale and extract all audio stream as a pcm_s16le ?

Else it always end with segfault :

...
    Stream #0:0(fre): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 5.1(side), s16, 4233 kb/s (default)
    Metadata:
      title           : Français VFI AC3 5.1 @ 448 Kbps
      BPS             : 448000
      BPS-eng         : 448000
      DURATION        : 02:28:51.680000000
      DURATION-eng    : 02:28:51.680000000
      NUMBER_OF_FRAMES: 279115
      NUMBER_OF_FRAMES-eng: 279115
      NUMBER_OF_BYTES : 500174080
      NUMBER_OF_BYTES-eng: 500174080
      _STATISTICS_WRITING_APP: mkvmerge v7.9.0 ('Birds') 64bit
      _STATISTICS_WRITING_APP-eng: mkvmerge v7.9.0 ('Birds') 64bit
      _STATISTICS_WRITING_DATE_UTC: 2015-11-04 17:32:41
      _STATISTICS_WRITING_DATE_UTC-eng: 2015-11-04 17:32:41
      _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
      encoder         : Lavc58.91.100 pcm_s16le
[wav @ 0x558a274d1cc0] Filesize 4726645260 invalid for wav, output file will be broken
size= 4615865kB time=02:28:51.68 bitrate=4233.6kbits/s speed= 310x
video:0kB audio:4615864kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000004%
/tmp/Test movie title.mkv

Test movie title

Erreur de segmentation
gurushida commented 7 months ago

The problem here seems to be that the input file is long enough to produce a wav file that will be larger than 4 gigabytes (4726645260 bytes), which is not supported by the wav format (see https://superuser.com/a/1523649). Unless you are willing to implement support for the RF64 header that would allow to support such larges files, the solution would be to split your input file in shorter clips and index each clip individually.

VigibotDev commented 7 months ago

The problem here seems to be that the input file is long enough to produce a wav file that will be larger than 4 gigabytes (4726645260 bytes), which is not supported by the wav format (see https://superuser.com/a/1523649). Unless you are willing to implement support for the RF64 header that would allow to support such larges files, the solution would be to split your input file in shorter clips and index each clip individually.

Very interesting. I tried Olaf in advanced maneer (only the C binary usage) but the performance not match my search. I look for audio fingerprint permit the detection of a duplicate sound track in a collection of 24198+ videos. N N = 585543204 comparison (minus 1xN for the video itself). but I need a similarities percentage for each NN result to make interesting data....

VigibotDev commented 7 months ago

I need complete audio (about 2 hour duration average) fingerprint to get similarity factor or percentage for N*N.