worldveil / dejavu

Audio fingerprinting and recognition in Python
MIT License
6.33k stars 1.43k forks source link

Fingerprinting using an API #110

Open wassimseif opened 7 years ago

wassimseif commented 7 years ago

Is it possible/ does anyone has an implementation for fingerprinting a song from a client app. Scenario :

  1. Administrators upload the songs( CMS, command line...)
  2. Mobile App (iOS or Android) uses sockets or RESTful APIs stream a song to the server.
  3. After x number of seconds , the server replies with either an song id / name or error in case fingerprinting fails.

Please any implementation would be helpful, even if it's not fully working. Ideas are highly appreciated.

lordkyzr commented 7 years ago

No matter how it gets to the server once it gets there you can process it. If you are using an API, return them an interim ID to check on the status of their request. When its done it can tell them the endpoint and the ID to call to get their other data points. Don't worry about notifying the client, let the client request it. At least that's how I implemented it for what I needed.

wassimseif commented 7 years ago

What I'm thinking about.

  1. Sockets to communicate between the server and clients
  2. The sockets activate dejavu and try to recognise the stream ( Which recogniser should i use here?)
  3. After this point, it's irrelevant how to return the response. Can u show me the implementation ? I'm stuck because i noticed that there isn't any StreamRecogniser. Just the FileRecogniser and MicrophoneRecogniser
lordkyzr commented 7 years ago

Apologize for not getting back to you it's been busy at work. I unfortunately don't have any code to show you because of the client's licensing but I can tell you what we did.

I never tried doing anything with sockets personally but I got around that but just having the client stream the files they want to compare to the server and storing them. I had the web server fire off a message to a processing queue and a service that would pick the file up and just use the FileRecogniser to process it. When the service that implements dejavu was done with the file, it would fire a message that was picked up by a different service that would pop a notification to the user that their file was processed. My use case was never needed real time because the client wanted to batch a ton of sound files to see which ones were similar.

If I had to guess I would think that the MicrophoneRecognizer is the closest you are going to get to the socket in dejavu's current state. I have never looked at that aspect of the code but it has to be taking the input from the mic and streaming it through dejavu pieces at a time. You would have to rewrite it to take the input from a socket instead of a microphone, but I would think you can reuse some of the existing code and just add the socket layer.

eoffermann commented 7 years ago

IMHO This is strictly a FileRecogniser type task unless you intend to do a fairly significant fork of the project. Don't stream to server. Record 3-5 seconds, optionally compress, then upload. Wash, rinse, repeat, as needed. Even then, it's messy.

While I really dig Dejavu as reference code, my solution was to rewrite the entire thing in C# using dejavu as a logical guideline so that we can fingerprint on the device, and perform recognition both on the device and on the cloud. That's not a small project either - there's a lot of SciPy stuff that doesn't have a direct equivalent, for instance, and you end up having to understand pretty much everything about how it works - but it adds flexibility. Once the code is .NET-friendly it's easy to virtualize, and on client side it's Unity-ready. So win-win. What we end up with isn't exactly dejavu compatible. I don't think we'd get matching fingerprints between the two solutions, though we could probably make that happen - that just wasn't the goal.

julzelements commented 6 years ago

I managed to get this situation working for my project. I hosted dejavu on a very simple flask app. It took a lot of fiddling, (I was completely new to hosting/servers/cloud stuff). My final prototype had: -iPhone recording 5 secs of .wav -sending it to the ec2 instance -doing the lookup in a mysql db -sending the result back to the phone -displaying to the user.

wassimseif commented 6 years ago

@julzelements Yes that's what I was thinking about. How long does this process takes? for a decent internet connection

julzelements commented 5 years ago

I sent the file as an uncompressed .wav. Most of the time was spent on step 2: sending the uncompressed .wav file to the ec2 instance. This would vary depending on the internet connection. I used a multipart form POST request. I haven't worked on the project for a few months, but it used to take about 10-20 seconds? The db lookup and return was more or less instantaneous.

wassimseif commented 5 years ago

Ahh okay, but I was looking for something more real-time. 2-3 seconds maybe. Sockets might do the trick

caveandre commented 5 years ago

I managed to get this situation working for my project. I hosted dejavu on a very simple flask app. It took a lot of fiddling, (I was completely new to hosting/servers/cloud stuff). My final prototype had: -iPhone recording 5 secs of .wav -sending it to the ec2 instance -doing the lookup in a mysql db -sending the result back to the phone -displaying to the user.

@julzelements Hi, sorry for pull out again this discussion. I'm developing an app with phonegap and I can record the song and send it to the server where dejavu is and then start the lookup. The problem is that no matter how long the record is, the result will not be the correct song so I'm wondering how did you resolve this issue! Thanks!

Alfnitacoder commented 1 month ago

I tried recording and sending the record file over to the server using an Android app, and it worked. I'd really like to know if using a socket can achieve the same result here.