mogwai / nanodrz

Speaker Diarization with Transformers
Other
57 stars 1 forks source link

I'm interested in this... #1

Open francqz31 opened 7 months ago

francqz31 commented 7 months ago

Hey Author , I really like the architecture used and the technique. I was looking for something similar to this to diarize 1k+ hours of different speakrs for tts as accurate as it can get. I wanna see any result of nanodrz in real use for example this video https://streamable.com/m5xvgf

I would like to contribute by compute or knowledge to scale this up and for it to become the new Sota, or be 99-100% accurate to unknown number of speakers.

Thanks in advance

mogwai commented 6 months ago

Thanks for your interest. I'm just having one last clean of the data and rejigging the synthetic generation for a last run to see if I can improve the model, my notes are all in the readme. My biggest issue is just how slow the data processing is taking at the moment, I'm getting slightly distracted by solving that problem :)

My new mega moonshot is to run all the audiothrough a denoiser first before training. This can be kind of seen as a normlisation step and will hopefully mean that new data won't be so "out of domain".

I'll hopefully have some results from this at the end of the week.

Compute wise if you have an a100 I can ssh into that would definitely speed things up :)

feel free to DM me on Signal

francqz31 commented 6 months ago

1-amazing , well I'm short of A100s now i used to have 9 , I have Rtx 4090 and Rtx 3090, one is getting trained on and one's not , so i don't know if that would help ?? 2- I can recommend some of the best or SOTA denoiser/speech enhancement algos if you want

mogwai commented 6 months ago
  1. I've got two 4090's and due to get some a100 / h100 from LAOIN
  2. Yes Please! I'm not too worried about this being perfect yet, want to see it's affects first
francqz31 commented 6 months ago

Ok wonderful , also once i have my 9A100s back i will still offer them if you need them in any interesting project. :) for denoising and enhancement the best thing till now is that https://github.com/yxlu-0102/MP-SENet :) . try it if you want and see if it is suitable for your usage if not i will recommend something else . but in my use case this works the best

francqz31 commented 6 months ago

there is also hifigan v2 https://daps.cs.princeton.edu/projects/Su2021HiFi2/ ? but no code available for it , but later i might try implementing it from https://github.com/rishikksh20/hifigan-denoiser (which is v1 unofficial implementation) and add something more to it.