Closed shashankpr closed 4 years ago
I've investigated these areas but haven't yet implemented anything for them, even though I am considering it.
You might also be able to work with similarity. E.g. if you add these lines in demo 2 after having computed the continuous embedding:
import matplotlib.pyplot as plt
plt.imshow(cont_embeds @ cont_embeds.T)
plt.show()
You will get this:
Clearly you can detect some speakers there, by looking for pattern of high similarity:
sounddevice
module can record audio and stream in real-time to numpy arrays, so you can work with that. You can then decompose the embed_utterance
function to achieve your goal. Define a maximum duration for your audio (it can be an order of magnitude higher than necessary, that's not a problem) and compute the wav slices based on that length: https://github.com/resemble-ai/Resemblyzer/blob/master/resemblyzer/voice_encoder.py#L141. From the wav slices, you will know when you will be able to grab a partial wav from the numpy array being streamed to. For this partial wav, create a unique spectrogram and forward it (with a batch size of 1), and you will have a partial embedding. Keep doing this while the audio is being recorded.This is a demo I meant to make too, but it's certainly more work than the other 5. Hope we'll get there.
Thanks for your detailed explanations.
(number_of_partials, embedding_size)
correct?I mean that at this point in the function: https://github.com/resemble-ai/Resemblyzer/blob/master/resemblyzer/voice_encoder.py#L151, the variable mels
has shape (N, 160, 40)
, where N is the batch size. You will probably end up with a mel of shape (160, 40)
so you will have to add an extra dimension (e.g. by doing mels[None, ...]
) before forward the mel.
Got it! Thank you very much for clearing these doubts. I will close this and will update here when I will make significant progress with unsupervised and streaming diarization. Great work once again!
Sure, it's fine if you leave it open until we figure it out.
Hi, @shashankpr Any progress on this task?
Hello, could you please refferce the task descriptions in your email as there are many.
Thank you, Lonnie Hartley
Get Outlook for Androidhttps://aka.ms/ghei36
From: Nikita Popov notifications@github.com Sent: Wednesday, April 8, 2020 7:31:52 AM To: resemble-ai/Resemblyzer Resemblyzer@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [resemble-ai/Resemblyzer] Compute embeddings from stream & unsupervised diarization (#10)
Hi, @shashankprhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fshashankpr&data=02%7C01%7C%7C0b3065ffb29c4e4c52b408d7dbc9948a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637219531125965499&sdata=08CV284zKbbKQ5KUQ%2BLVobBufEFwQ9txsPaVA1uzeVU%3D&reserved=0 Any progress on this task?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fresemble-ai%2FResemblyzer%2Fissues%2F10%23issuecomment-610994042&data=02%7C01%7C%7C0b3065ffb29c4e4c52b408d7dbc9948a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637219531125965499&sdata=8XylhOBft%2BdTybW18LgGBiVB1rhRiKQ2gx0c1fQJWrU%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMCVZK5T6HE5TYMLUZWEOT3RLSDFRANCNFSM4IZHG45A&data=02%7C01%7C%7C0b3065ffb29c4e4c52b408d7dbc9948a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637219531125975485&sdata=RaxDCOOSpaiBJBjJZJ7WMWhdslp%2BeOpjdvbgKQmgZ7g%3D&reserved=0.
Hi @nikitalpopov , I have been doing some experiments around this but haven't really got proper time to implement something good. I am going to start working on it from this week and I will update you if I make any progress
@shashankpr Could I help you with something?
@CorentinJ @shashankpr I tried to make it by myself, but results are horrible (DER is not getting any better than 60%). Could you, please, check my test notebook? https://github.com/nikitalpopov/master/blob/dev/demo.ipynb
Writing my solution here, since I've been trying to implement a way of embedding during streaming. In my use-case, streaming happens by pushing bytes of audio segments:
import io
import numpy as np
import soundfile as sf
from resemblyzer import VoiceEncoder
encoder = VoiceEncoder()
def embed(chunk_bytes: bytes) -> np.ndarray:
"""Embed the given chunk of WAV-bytes."""
data, _ = sf.read(
io.BytesIO(chunk_bytes),
samplerate=16000,
channels=1,
format='RAW',
subtype='PCM_16',
endian='FILE',
)
return encoder.embed_utterance(data)
An example of this code's result (after PCA) are shown below:
Hi, great work and great repo really. Your code and examples helped me understand the flow very easily. I am currently working on a speaker identification task wherein I want to detect "who spoke when" with low latency. There are two tasks that I need to overcome and I was wondering if you had already worked on them or have plans to in future. If not, then I would be glad to contribute to your repo as a PR. The tasks are as follows:
I know that they can be done with few tweaks but I would like to know your insight on them if you had already worked or have idea about them. Thanks!