spotify / echoprint-codegen

Codegen for Echoprint
http://echoprint.me/codegen
Other
960 stars 291 forks source link

actual hash rate is about half what the Ellis-Whitman-Porter-11 paper states #82

Open adrianomitre opened 9 years ago

adrianomitre commented 9 years ago

In section 2 of the paper ECHOPRINT - AN OPEN MUSIC IDENTIFICATION SERVICE, it is stated that the "the overall hash rate is approximately 8 (bands) × 1 (onset per second) × 6 (hashes per onset) ≈ 48 hashes/sec". However, all the songs I have ran echoprint-codegen on have resulted in a much lower figure: always in the 23-28 hashes per second range, with an average slightly above 25. I am computing hash rate as H/L, there H is the total number of hashes produced for a song and L is the song length in seconds (which can be estimated as the maximum hash frame divided by the time quanta of the frame 11025/256 ≈ 43.07).

My fork of codegen which prints the hashes unhashed in [frame, band, delta1, delta2] JSON format is public and the following Ruby code computes the "hash" rate of the arguments:

#!/usr/bin/env ruby

require 'json'

def get_code(filename)
  JSON.parse(JSON.parse(File.read(filename))[0]["code"])
end

# Mean code rate in codes per second.
#
TimeQuantum = 11_025 / 256.0
def mean_code_rate(code)
  max_frame = code.map {|fr, b, d1, d2| fr }.max
  code.size / (max_frame / TimeQuanta)
end

ARGV.each do |filename|
  r = mean_code_rate(get_code(filename))
  puts "#{"%.2f" % r} ; #{filename}"
end
adrianomitre commented 9 years ago

It is stated, in section 2 of the paper, that "the overall hash rate is approximately [...] 48 hashes/sec". Then, in section 3, it is stated that "A 30 second query has about 800 hash keys." (800/30 = 26,6 hashes/sec). Only one of theses statement can be correct, and according to the results detailed in the previous comment, I would say it is the second.