Closed dennyabrain closed 5 months ago
segment.cross_similarity
- https://librosa.org/doc/main/generated/librosa.segment.cross_similarity.html#librosa.segment.cross_similarityAcoustic fingerprint
- https://en.wikipedia.org/wiki/Acoustic_fingerprint
Spectral Analysis
- librosa - Mel-frequency cepstral coefficients (MFCCs) - https://librosa.org/doc/main/generated/librosa.feature.mfcc.htmlShazam Like App
- https://github.com/MarwaAbdelAal/Shazam-like-app/blob/master/main.pyEnd of Week Deliverables after Status Check :
We have an operator working that finds the fingerprint of an given audio file using signal processing.
It firsts finds a spectrogram of the audio file and then using it it finds the fingerprint by finding a list of (positive) frequencies (scaled to [0, 1]) at which the local periodogram has a peak
.wav
file. [Article Link] [GitHub]
Given an audio file, this methods finds a vector of 2048 dimensions using PANNs. PANN is a CNN that is pre-trained on lot of audio files. They have been used for audio tagging and sound event detection. The PANNs have been used to fine-tune several audio pattern recognition tasks, and have outperformed several state-of-the-art systems.
Audio embeddings are often generated using spectrograms or other audio signal features. In the context of audio signal processing for machine learning, the process of feature extraction from spectrograms is a crucial step. Spectrograms are visual representations of the frequency content of audio signals over time. The identified features in this context encompass three specific types:
All the audio files have to be of the .wav
file format and once this operators process it, it will return an vector of dimension - 2048
.
I index and search for this vector using curl commands listed below.
Step 1 - Create an index called "audio" with specific mappings
curl -X PUT "es:9200/audio" -H 'Content-Type: application/json' -d '{"mappings": {"_source": {"excludes": ["audio-embedding"]},"properties": {"audio-embedding": {"type": "dense_vector","dims": 2048,"index": true,"similarity": "cosine"},"path": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"timestamp": {"type": "date"},"title": {"type": "text"},"genre": {"type": "text"}}}}'
Step 2 - see a list of all the indices, check if the audio index is created
curl -X GET "http://es:9200/_cat/indices?v"
Step 3 - Store a vector in the audio index
curl -X POST "es:9200/audio/_doc" -H 'Content-Type: application/json' -d '{"audio-embedding": [0.0, 0.0, 0.029310517013072968, 0.02595067210495472, 0.023528538644313812], "path": "path1", "timestamp": "2024-02-07T12:00:00", "title": "title1", "genre": "genre1"}'
Step 4 - Search for the indexed vector. We use cosine similarity to search for the vector
curl -X GET "es:9200/audio/_search" -H 'Content-Type: application/json' -d '{"query": {"script_score": {"query": {"match_all": {}}, "script": {"source": "cosineSimilarity(params.query_vector, '"'"'audio-embedding'"'"') + 1.0", "params": {"query_vector": [0.0, 0.0, 0.029310517013072968, 0.02595067210495472, 0.023528538644313812]}}}}}'
The pull request for this operators - https://github.com/tattle-made/feluda/pull/59
Overview
Acceptance Criteria