jhurliman / node-echoprint-server

A node.js implementation of the Echoprint music identification server
74 stars 37 forks source link

Inconsistent with the official version #8

Open mavenlin opened 11 years ago

mavenlin commented 11 years ago

After comparing your code with the official java code, I find the following inconsistencies:

  1. In the official code, before searching the database for codes, they did termSet.addAll(Arrays.asList(queryTerms)); which means the repeated code in the query is only counted once. Say, if a query is 1 2 2 3 3 4, it is converted into 1 2 3 4.
  2. The frequency of one code in one document is not counted in the official implementation. In the eval function of the official code, the freqs variable is never used. It only counts how many unique codes the docs has. Say, if a document in the database is 1 1 1 2 2 3 3 3 5 5 6 6 with the query 1 2 3 4, the score is 3, because it contains 1 2 and 3. But in your implementation, it'll return 1_3+2_2+2*3=13.

Maybe you've done a test that your implementation is better than the official one?

jhurliman commented 11 years ago

Hi, sorry for the slow response. I don't have a lot of time to actively maintain this project right now.

I have made a few optimizations in both the algorithm and the implementation as compared with the original Echoprint server (mostly for doing partial matches in long audio files), but I'm not sure if the issues you raised were an optimization I made. It could be an oversight. I don't have the time to look into this myself, but I would be interested in seeing what impact those changes have on matching accuracy and speed. In general, I prefer accuracy over speed, but a config setting that allows this to be tuned could be useful too.