idnavid / pyknograms

implementation of pyknogram extraction for co-channel speech analysis.
5 stars 6 forks source link

Use of code for overlapdetection in multiple speakers #1

Closed prachiisc closed 3 months ago

prachiisc commented 5 years ago

Hello, Will I be able to use this code for recording with multiple speaker and detecting in between overlaps. I was trying with few examples but could not detect.

idnavid commented 5 years ago

Hi @prachiisc , Yes, the features are designed for overlap detection of 1 vs. many speakers.

The reason you're not getting great detection results could be because of the classifier you're using. In my experience you can use k-means with more than two clusters, and then look at the extreme clusters in terms of mean pyknogram features (meaning the features you end up extracting from pyknograms).

I'd be happy to help you get started if you need some minor adjustments to the code.

Navid

prachiisc commented 5 years ago

Hi Navid, Thank you for answering my question. I want to elaborate my problem. I am working on diarization. My files are mostly meetings conversation with lots of overlap. I extract 1.5 sec i-vectors with 0.75sec shift on a around 2 min recording. Now I want to detect whether the 1.5 sec segment has overlap or not. In your paper you have given some 'Dovl' to find difference between adjacent frames of pyknogram. But it suggests to use 2 segment average of these scores for better performance. But now I computed one average score for each 1.5 sec segment and plotted. I am not able to find regions of overlap using that.

As you suggest, should I try k means on these scores to find the regions?

Will it be useful, if I use bulid LSTM model using these pyknograms?

Thanks & Regards, Prachi

On Thu, Feb 21, 2019 at 4:03 AM Navid Shokouhi notifications@github.com wrote:

Hi @prachiisc https://github.com/prachiisc , Yes, the features are designed for overlap detection of 1 vs. many speakers.

The reason you're not getting great detection results could be because of the classifier you're using. In my experience you can use k-means with more than two clusters, and then look at the extreme clusters in terms of mean pyknogram features (meaning the features you end up extracting from pyknograms).

I'd be happy to help you get started if you need some minor adjustments to the code.

Navid

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/idnavid/pyknograms/issues/1#issuecomment-465782398, or mute the thread https://github.com/notifications/unsubscribe-auth/AtkEcFjJNwSaw6o_FfpYZIxZVFdHzh4Yks5vPc0mgaJpZM4bFsdC .

-- Regards, Prachi Singh

idnavid commented 5 years ago

There should be at least a slight visual indication of change in the D_ovl outputs before you use k-means. So if you're saying you can't see any difference, k-means might not work.

In general the pyknograms are pretty robust to real noise conditions. Maybe this paper will give you a better idea of how to approach your problem. https://ieeexplore.ieee.org/abstract/document/7178867

I'm not sure how you'd exactly go about using LSTMs for this. It might work well, but I don't have any experience on how it'll work out. I have tried in the past to train an HMM using Kaldi, you may have seen some traces of code in the repo, but I never followed through.

Thanks, Navid

prachiisc commented 5 years ago

Thank you Navid for the help. I will get back to you in case I need any help with code.

Regards, Prachi

On Thu, Feb 21, 2019 at 11:44 AM Navid Shokouhi notifications@github.com wrote:

There should be at least a slight visual indication of change in the D_ovl outputs before you use k-means. So if you're saying you can't see any difference, k-means might not work.

In general the pyknograms are pretty robust to real noise conditions. Maybe this paper will give you a better idea of how to approach your problem. https://ieeexplore.ieee.org/abstract/document/7178867

I'm not sure how you'd exactly go about using LSTMs for this. It might work well, but I don't have any experience on how it'll work out. I have tried in the past to train an HMM using Kaldi, you may have seen some traces of code in the repo, but I never followed through.

Thanks, Navid

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/idnavid/pyknograms/issues/1#issuecomment-465875541, or mute the thread https://github.com/notifications/unsubscribe-auth/AtkEcE7CMEjjB1Vj6xjMBjjcuBNPfHoOks5vPjlQgaJpZM4bFsdC .

-- Regards, Prachi Singh

prachiisc commented 5 years ago

Hi Navid, Some of my files are around 5-10 minutes long. I am not able to generate pyknogram for them. The script got stuck and gave blank plot as output. Can you suggest something?

Regards, Prachi

On Sat, Feb 23, 2019 at 11:00 AM Prachi Singh prachisingh7076@gmail.com wrote:

Thank you Navid for the help. I will get back to you in case I need any help with code.

Regards, Prachi

On Thu, Feb 21, 2019 at 11:44 AM Navid Shokouhi notifications@github.com wrote:

There should be at least a slight visual indication of change in the D_ovl outputs before you use k-means. So if you're saying you can't see any difference, k-means might not work.

In general the pyknograms are pretty robust to real noise conditions. Maybe this paper will give you a better idea of how to approach your problem. https://ieeexplore.ieee.org/abstract/document/7178867

I'm not sure how you'd exactly go about using LSTMs for this. It might work well, but I don't have any experience on how it'll work out. I have tried in the past to train an HMM using Kaldi, you may have seen some traces of code in the repo, but I never followed through.

Thanks, Navid

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/idnavid/pyknograms/issues/1#issuecomment-465875541, or mute the thread https://github.com/notifications/unsubscribe-auth/AtkEcE7CMEjjB1Vj6xjMBjjcuBNPfHoOks5vPjlQgaJpZM4bFsdC .

-- Regards, Prachi Singh

-- Regards, Prachi Singh

idnavid commented 5 years ago

Hi @prachiisc , Sorry for the one day delay. I was a bit busy yesterday.

I looked into the code and was only able to get reasonable run time for 1 minute files. If possible, could you split your files into shorter segments (1 minute long or less). This wasn't an issue for me back when I was working on this, because I used to split the files and run the code on a cluster.

Also a few other suggestions:

Cheers, Navid

prachiisc commented 5 years ago

Hi Navid, Thank you for the help. The dataset on which I want to test is 16 kHz and I don't want to down sample it. Can you provide dataset which you used for the paper so that I will be able to verify with that? You have referred to GRID dataset but it does not overlaps. So did you got those from Source separation challenge?

Right now, I am feeding 1.5 sec chunk of samples to pyknogram function and taking mean of the score it outputs. But still it is taking time.

Regards

On Tue, Feb 26, 2019 at 5:19 AM Navid Shokouhi notifications@github.com wrote:

Hi @prachiisc https://github.com/prachiisc , Sorry for the one day delay. I was a bit busy yesterday.

I looked into the code and was only able to get reasonable run time for 1 minute files. If possible, could you split your files into shorter segments (1 minute long or less). This wasn't an issue for me back when I was working on this, because I used to split the files and run the code on a cluster.

Also a few other suggestions:

  • I've made an update to the code and removed references to paths. The new update might be a bit easier to plug and play into your system.
  • I would put less weight on the output of overlap_decision.py, since it is just an example of how to simply treat pyknograms.
  • It might be better to use 8KHz sampling rate. I checked and the code works for other sampling rates, but since I did most of my work on 8K back in the day, there's a chance that it's more reliable.

Cheers, Navid

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/idnavid/pyknograms/issues/1#issuecomment-467231912, or mute the thread https://github.com/notifications/unsubscribe-auth/AtkEcEMCdcVWLInygCCrsOPzNCSkxQT6ks5vRHZ5gaJpZM4bFsdC .

-- Regards, Prachi Singh

idnavid commented 5 years ago

Sample rate conversion shouldn't be very important.

Most of the data I used is publicly available. The artificial overlap data was from the source separation challenge. For real experiments I used the AMI corpus. I also created naturalistic overlap by mixing the two channels available in some of NIST SRE phone-call data.

1.5 sec sounds reasonable. Looking back at the code I can see that there are some inefficiencies that could be addressed. Unfortunately, I'm not in a position right now to spend more time on it.

Good luck N

prachiisc commented 5 years ago

Thank you Navid for the help so far. I will try to proceed further using your suggestions.

Regards, Prachi

On Fri, Mar 1, 2019 at 5:15 AM Navid Shokouhi notifications@github.com wrote:

Sample rate conversion shouldn't be very important.

Most of the data I used is publicly available. The artificial overlap data was from the source separation challenge. For real experiments I used the AMI corpus. I also created naturalistic overlap by mixing the two channels available in some of NIST SRE phone-call data.

1.5 sec sounds reasonable. Looking back at the code I can see that there are some inefficiencies that could be addressed. Unfortunately, I'm not in a position right now to spend more time on it.

Good luck N

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/idnavid/pyknograms/issues/1#issuecomment-468485222, or mute the thread https://github.com/notifications/unsubscribe-auth/AtkEcFRkd3Yb0t5aK0m0pS9Ectxsd7agks5vSGowgaJpZM4bFsdC .

-- Regards, Prachi Singh