Closed hbredin closed 8 months ago
@hbredin as you mentioned in #186, I would also prefer to have a single instantiation of Powerset
that runs in the same device as SegmentationModel
.
I think we have 2 options here:
PowersetAdapter
so the PyannoteLoader
can do something like return PowersetAdapter(Model.from_pretrained(model_info))
PowersetToMultilabel
block that simply expects a powerset input and does the conversion. For this, we'd have to know from the model whether it is powerset or not, for example by adding a @property
abstract method to SegmentationModel
. This could simply default to False
so that it isn't a concern for most usersI would prefer the first one for now because it's automatic and has minimal impact, but we may have to move to the second one if someone else (other than pyannote) releases a powerset model.
class PowersetAdapter(nn.Module):
def __init__(self, segmentation_model: nn.Module):
self.model = segmentation_model
self.powerset = Powerset(...)
def __call__(self, waveform: torch.Tensor) -> torch.Tensor:
return self.powerset.to_multilabel(self.model(waveform), soft=False)
class PyannoteLoader:
...
def __call__(self) -> nn.Module:
model = pyannote_loader.get_model(self.model_info, self.hf_token)
specs = getattr(model, "specifications", None)
if specs is not None and specs.powerset:
model = PowersetAdapter(model)
return model
Trying this but now diart.stream
complains that AttributeError: 'PyannoteSegmentationModel' object has no attribute 'duration'
even though I added the following properties to PowersetAdapter
:
@property
def sample_rate(self) -> int:
return self.model.hparams.sample_rate
@property
def duration(self) -> float:
return self.model.specifications.duration
A bit lost here but it's late :-) Sleep will most likely help!
@hbredin that's weird, can you push the code so I can take a look?
you probably need to forward specifications
from PowersetAdapter
to Model
:
class PowersetAdapter(nn.Module):
def __init__(self, segmentation_model: nn.Module):
self.model = segmentation_model
self.powerset = Powerset(...)
@property
def specifications(self):
return getattr(self.model, "specifications", None)
def __call__(self, waveform: torch.Tensor) -> torch.Tensor:
return self.powerset.to_multilabel(self.model(waveform), soft=False)
Because PyannoteSegmentationModel
will need the loaded model to have model.specifications.duration
and model.specifications.sample_rate
.
Again, this will disappear when I move the config to a yaml file. That way we won't need a default duration or sample rate, it will be expected in the config or CLI args
Thanks! I'll try to debug after work today or tomorrow and get back if it's not solved until then 😄
Adding the specifications
properties does not help.
@hbredin can you post the stacktrace?
diart.stream --segmentation pyannote/segmentation-3.0 audio.wav
Traceback (most recent call last):
File "REDACTED/bin/diart.stream", line 8, in <module>
sys.exit(run())
File "REDACTED/diart/src/diart/console/stream.py", line 107, in run
pipeline = pipeline_class(config)
File "REDACTED/diart/src/diart/blocks/diarization.py", line 97, in __init__
msg = f"Latency should be in the range [{self._config.step}, {self._config.duration}]"
File "REDACTED/diart/src/diart/blocks/diarization.py", line 74, in duration
self._duration = self.segmentation.duration
File "REDACTED/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'PyannoteSegmentationModel' object has no attribute 'duration'
@hbredin while we figure this out, you can override the duration with --duration 5
. For the sample rate, which I imagine will be a similar problem, you can temporarily hard-code it in SpeakerDiarizationConfig
. That should unblock you for now to try out the new model
Ah! --duration=10
solves this first issue but it now complains about missing sample_rate :-)
And there does not seem to be a --sample-rate
option :-/
Ok. Misread your previous comment. You were already aware of it :)
@hbredin I don't know if you branched from main but I highly recommend rebasing on top of develop now that #188 is merged
@hbredin we just broke a record here, performance on AMI using duration=10, step=0.5 and latency=5 (same as the paper except for the 10s context) gives DER=26.7. Previous best on AMI for that config was 27.3
This is without tuning rho_update
and delta_new
, which should squeeze a bit more performance. I would like to run the tuning myself but I fear my laptop will catch fire 😅 I'd really like to have a caching feature for that
@hbredin we just broke a record here, performance on AMI using duration=10, step=0.5 and latency=5 (same as the paper except for the 10s context) gives DER=26.7. Previous best on AMI for that config was 27.3
This is without tuning
rho_update
anddelta_new
, which should squeeze a bit more performance. I would like to run the tuning myself but I fear my laptop will catch fire 😅 I'd really like to have a caching feature for that
Wait until I try with pyannote.premium
;-)
What's the command line I should run?
All checks are failing but I don't think they are related to this PR.
yeah don't worry about the "Quick Runs" CI fails, it's unrelated. It needs a huggingface token to run, and it can't find it in your fork's secrets. This is actually why I want to host a pair of freely available ONNX models somewhere to run the CI, probably even quantized models.
However, please format with black so the lint passes.
You can run the following command for the AMI eval:
diart.benchmark /ami/audio/test --reference /ami/rttms/test --segmentation pyannote/segmentation-3.0 --duration 10 --latency 5 --step 0.5 --tau-active 0.507 --rho-update 0.006 --delta-new 1.057 --batch-size 32 --num-workers 3
Now you start to see why I want to put configs in a yml file 😅
@hbredin looks like something went wrong with your rebase. I'm missing changes from #188
If I am not mistaken @hbredin we should not need to tune Rho (e.g. the SpeakerThreshhold) for the Powerset model, as such it might be worth it to subclass the SpeakerDiarization
pipeline with a custom hyper_parameters()
function?
Edit: Rho should be tuned, see below, I confused rho and tau.
@sorgfresser you may still want to tune rho_update
, the powerset model doesn't relieve you of this parameter. It's removing the embedding model that would help you with that.
Keep in mind that rho_update
can be interpreted as "what percentage of the chunk must this embedding represent in order to update the clustering centroids?"
Sorry, I was referring to Tau, it's getting late...
Ok apart from the linting and the Inference
import that we should remove, this is good to go from my side. I'll wait for those changes and merge
Quick pyannote.premium
run without any hyperparameters tuning:
diart.benchmark /ami/audio/test --reference /ami/rttms/test \
--segmentation ... \
--latency ... --step 0.5 \
--tau-active 0.507 --rho-update 0.006 --delta-new 1.057
Segmentation | Embedding | Latency | FA | MD | SC |
---|---|---|---|---|---|
pyannote/segmentation-3.0 |
pyannote/embedding |
5s | 3.7 | 10.1 | 12.6 |
pyannote/premium 👀 |
pyannote/embedding |
1s 👀 | 3.8 | 7.6 🎉 | 16.4 😠|
FA
= false alarm rate / MD
= missed detection rate / SC
= speaker confusion rate
Looks like pyannote/embedding
default clustering hyper-parameters (degraded SC 😠) are not adapted to pyannote/premium
segmentation (improved FA+MD 🎉).
Still needs a bit of hparams tuning but very promising!
@hbredin nice! I see you've been having fun with diart.benchmark
then 😄
Sidenote: this requires pyannote develop version as of now since pyannote/pyannote-audio#1516 is needed.
Not sure when I'll release that so it would be safer to remove the use of soft=False
which is anyway the default behavior in pyannote.audio 3.0.1
Addresses #186.
Note that this is a first (working) attempt that still needs some love. Hence the draft status...
As a bonus, you get the first (?) walrus operator of
diart
, yay!