worldveil / dejavu

Audio fingerprinting and recognition in Python
MIT License
6.36k stars 1.43k forks source link

Optimize maximum filter #203

Open vincent-163 opened 4 years ago

vincent-163 commented 4 years ago

The line: https://github.com/worldveil/dejavu/blob/d2b8761eb39f8e2479503f936e9f9948addea8ea/dejavu/fingerprint.py#L98 is taking up around 80% of processing time (my feeling), which makes it a bottleneck in processing.

The maximum filter loops over all positions, takes the maximum of all numbers in a diamond-like area centered at the position. The time complexity of the naive implementation is O(NMW^2), where N is number of windows, M is number of frequency ranges, and W is window size (PEAK_NEIGHBORHOOD_SIZE=20).

Using a data structure known as monotonic queue, it should be possible to reduce the time complexity down to linear time, which is O(N*M), dividing processing time by up to 400.

The algorithm has been implemented in scipy as scipy.ndimage.filters.maximum_filter1d. By applying such an algorithm twice, on the X axis and then on the Y axis, we can get the rectangle maximum. The drawback is that the filter is similar but different from the original one, and the impact on accuracy is unknown. It is technically possible to compute the original maximum filter in linear time but it's harder to implement.

A simple benchmark shows that the time to scan all 5 files in the mp3 folder reduced from 2m55s to only 40s, a 4.5x improvement. Note that the number of fingerprints reduced from 494217 to 353834 due to a different maximum filter, and the impact is unknown. I expect some more improvement by removing the binary erosion, but on my machine it seems that the bottleneck has already moved from Python to MySQL. For indexing purposes it might be better to use an embedded database, such as RocksDB, to fully squeeze performance.

omertoptas commented 4 years ago

but on my machine it seems that the bottleneck has already moved from Python to MySQL

I am pretty sure that the real bottleneck is the part where the hashes are written in MySQL database.

vincent-163 commented 4 years ago

but on my machine it seems that the bottleneck has already moved from Python to MySQL

I am pretty sure that the real bottleneck is the part where the hashes are written in MySQL database.

Tests on my machine show about 80% of the time is spent on this particular step. If that was not your case, feel free to benchmark it and show your results.

If that was not the case, the only possible reason I can think of is that you're running the database on a magnetic disk. You should use SSD instead. While SSD is just 4x faster in terms of bandwidth, they can be up to 100x faster in terms of I/O.

mauricio-repetto commented 4 years ago

I think it is enough to change the filter shape, instead of struct = generate_binary_structure(2, 1) to struct = generate_binary_structure(2, 2), my times were comparable to the ones that are you presenting @vincent-163 with your changes. And I think in my case you preserve the same filter shape for the maximum filter and the erosion operation, which you are not doing on your current approach.

mauricio-repetto commented 4 years ago

@worldveil is there any reason of using for maximum_filter function the kernel that is in the example of the scipy docs when using the iterate_structure? And despite it does not take too much time, I'm still not sure what is the purpose of the erosion operation and its combination with the maximum filter results.

mauricio-repetto commented 4 years ago

@vincent-163 I've added the changes that I mentioned to you to my pr here: https://github.com/worldveil/dejavu/pull/205 which is a pr that I've created to make dejavu available for python 3.6.6 and with the use of postgresql and mysql. I've added a couple of features more but feel free to try it and let me know your thoughts.

Thanks, Mauricio.