Closed henkit closed 9 years ago
Dejavu only recognizes fragments of audio. It is, after all, fragments that make a whole.
The offset
parameter shows you the particular time offset into the file where the "fragment" you matched lies.
If you follow the README.md you'll see how to do this for the example mp3 files included.
Hi Wordveil,
Thanks for your response, much appreciated! Agree, fragments makes a whole but i'm als trying the fragments learned to a whole :) I indeed noticed the offset but what i do not see is if i can set a startpoint from where i would like to start fingerprinting again so i could step trough larger a file with possible other matches.
Cheers, Henk
From the database standpoint, you could "cut up" songs with ffmpeg
if you wanted to have each fragment register as a different "song" in the database with its own id, name, etc. Then you could recognize fragments.
Otherwise of course you can look at groups of fingerprints and try to see when they match up.
I'm not entirely clear on your use case.
Hi,
Thanks again for getting back! What i would like is to monitor broadcasted radio commercials, what i'm working on:
Hope this helps explaining what my issue is, maybe i'm on the wrong mindset here but i do appreciate you having a look at this!
Cheers, Henk
I sometimes consult on this sort of thing, but I'll elaborate a bit here.
A very simple solution might be to extract 5-10 sec windows, save to disk (or memory with StringIO) and fingerprint each for commercial or not (you would have to mark new commercials however).
Are there any particular functionalities you think could be added to Dejavu to make this easier?
Hi, To recognize audio from a file it also would be great to be able to set:
That way it would be easy to step trough a larger file in pieces of 10 sec for instance and mark those pieces as known/unknown.
Cheers, Henk
If you can come up with a way that augments the current API and is easy to use, I'd gladly accept at PR!
Hi,
Thanks for your help, i'm using ffmpeg to split the larger file in pieces of 5 seconds now and use Deja Vu for it's purpose!
Only one thing i run into now is that besides Deja Vu recognizes the correct blocks it also shows a result on 5 sec blocks that should not produce a match while looping trough the 5 sec blocks of audio. Deja Vu then provides random matches as it seems, i do not get a "no result found". I've tested with compressed mp3 format and uncompressed wav format.
Cheers, Henk
Please provide examples and code to run against, otherwise I can't evaluate whether yours is a real problem.
Hi,
Thanks for you response, more then happy to. I'm using this to fingerprint the commercials:
from dejavu import Dejavu
import warnings
import json
warnings.filterwarnings("ignore")
# load config from a JSON file (or anything outputting a python dictionary)
with open("commercial.cnf") as f:
config = json.load(f)
# create a Dejavu instance
djv = Dejavu(config)
# Fingerprint all the mp3's in the directory we give it
djv.fingerprint_directory("../commercials/december", [".wav"])
And I use this piece of code to scan trough a directory containing all the parts (currently 20 sec parts) of a larger recording I did split using ffmpeg.:
from dejavu import Dejavu
from dejavu.recognize import FileRecognizer
import os, fnmatch
def find_files(directory, pattern):
for root, dirs, files in os.walk(directory):
for basename in files:
if fnmatch.fnmatch(basename, pattern):
filename = os.path.join(root, basename)
yield filename
config = {
"database": {
"host": "---",
"user": "---",
"passwd": "---",
"db": "---"
},
"database_type" : "mysql",
}
# create a dejavu object
djv = Dejavu(config)
# gather files to fingerprint
UNLABELED_AUDIO_DIR = "../test/"
PATTERN = "*.wav"
audio_paths = find_files(UNLABELED_AUDIO_DIR, PATTERN)
# recognize them one at a time
original_file_to_song = {}
for path in audio_paths:
print "Attempting to recognize %s..." % path
song = djv.recognize(FileRecognizer, path)
original_file_to_song[path] = song
#print "Audio file at: %s was recognized as %s" % (path, song)
# see the songs you've recognized
for path, song in original_file_to_song.iteritems():
print "Audio file at: %s was recognized as %s" % (path, song)
I also created a zip of the files i'm testing on, you can download it here: www.henk.it/deja_vu/examples.zip (246MB). For example:
Hope this helps, please let me know if you need additional information.
I did some additional testing, matching on the known commercials is a 100% match.
Cheers, Henk
Hi,
Any chance you where able to look why I do get matches when it shouldn't?
Cheers, Henk
Op 1 jan. 2015 om 00:13 heeft Will Drevo notifications@github.com het volgende geschreven:
Please provide examples and code to run against, otherwise I can't evaluate whether yours is a real problem.
— Reply to this email directly or view it on GitHub.
Hi worldveil,
You perhaps have an update for me? Don't get it why I do get a match on fragments when I should not return a match.
Cheers, Henk
Hi worldveil,
You perhaps have an update for me? Don't get it why I do get a match on fragments when I should not return a match.
Cheers, Henk
I have same situation as finding multiple fragments from one file. Is there any code that I can learn? Thanks henkit
Hi,
I've been playing around with deja vu and able to recognize one fingerprinted fragment but what i would like is to recognize all the known audio fragments from one 10 min audio file. I've been reading trough the issues but not able to find similar questions or examples. Can anyone please help me out with this? I must also say that i'm a newbie to python :)
Cheers, Henk