worldveil / dejavu

Audio fingerprinting and recognition in Python
MIT License
6.36k stars 1.43k forks source link

Slow search #206

Closed sagaya closed 4 years ago

sagaya commented 4 years ago

I have a lot of files fingerprinted already but searching is slow, takes 50sec sometimes is there a way to optimize this? Also would it be better if i search with a wav file instead of mp3

omertoptas commented 4 years ago

I have the same issue, searched for 3 days and could not find any answer that solves the problem.

artgancz commented 4 years ago

There are several ways of overcoming this problem. Firstly by moving with computations to GPU (like in https://github.com/CwbhX/Jamais-Vu, for me even vanilla implementation with tensorflow in eager execution mode improved the speed). Secondly while fingerprinting, max filtration is a real bottleneck as pointed out in one of pull requests. But instead of replacing it with two 1D consecutive filters without any warranty about compatibility, you can make it faster by removing erosion and using just normal 2D square max filter with "neighborhood" increased 1.5 times which is faster and more likely to be compatible with previous solution. Aligning also could be refactored slightly largest = 0 largest_count = 0 song_id = -1 diff_counter = dict.fromkeys(matches, 0) for tup in matches: diff_counter[tup] += 1 if diff_counter[tup] > largest_count: largest_count = diff_counter[tup] song_id, largest = tup. Another experimental one would be to stop this loop when "largest_count" is greater then some beforehand chosen threshold.

omertoptas commented 4 years ago

Aligning also could be refactored slightly

Hello, firstly thanks for the answer. I have already changed aligning a bit and it works fine. Also I followed your suggestions about finding peaks while fingerprinting and yeah it worked fine too. However my real problem is about the MySQL part. I tracked the problem and found that most of the time is spend here : cur.execute(query, split_values) which is inside the return_matches function inside the database_sql.py. Currently I am invesigating the issue and if I found something I will post here.

gitteraz commented 4 years ago

@omertoptas i have the same database issue using PostgreSql. Have you found a solution?

omertoptas commented 4 years ago

@omertoptas i have the same database issue using PostgreSql. Have you found a solution?

Hi, unfortunately I was not able to find a solution to that, and I am also suprised that you have the same problem in PostgreSql because I thought the problem was about MySQL. I was trying to create the same database with PostgreSQL these days, but after seeing your comment, I think I'll give up this way.

From my trials in several different conditions, I think most of the time is spending in the process while fetching the data from the HDD harddisk with the following code : cur.execute(query, split_values) I found that when you try to insert fingerprints to the database it stores the values inside the RAM first then it copies them to the hard disk. So like in the example case in Dejavu's ReadMe.md, if you fingerprint small amount of music files (about 50-100) and after that try to recognize some of them from the microphone, it returns the result instantly(in 0.5-1seconds). This is because most of the fingerprint values(hashes) that you inserted into database still stored inside the RAM and fetching the data from RAM is thousands time faster then fecthing them from HDD disk. However once you close your PC or restart it, RAM is resetted to and the fingerprint values witch are stored in RAM is gone and next time you want to recognize the songs with microphone, now MySQL tries to fetch the data from the database which is stored in your hard disk. And this takes quiet a lot time. I think if you have a database with has a lot of music files (more then 500 songs) you should use SSD in your system to retrieve fingerprints from database fast enough (maybe this will not be as fast as fethcing the data from RAM but definitely it will be faster then HDD).

Here is my stackoverflow question about this topic, it did not answered yet but if you want to follow you can check it from here : https://stackoverflow.com/questions/58058304/python-mysqldb-execute-took-so-much-time It also has detailed information about the problem and the list of the things I have tried while trying to solve problem.

Also I really want to see how did you created PostgreSQL in your system. If you used any branch, or pull requests about it, please let me know, thank you.

mauricio-repetto commented 4 years ago

@sagaya / @omertoptas : In all my local tests, the maximum_filter function was the bottleneck, I've improved the current dejavu code a little bit in the following pr https://github.com/worldveil/dejavu/pull/205, which makes dejavu available for python 3.6.6 and for use with postgresql or mysql. I've added new features as well as you will see but feel free to try it and let me know if that helped with your times.

Thanks,

worldveil commented 4 years ago

@mauriciorepetto's wonderful work has now been merged, and it helps a lot with this.