Open hbredin opened 1 year ago
Thanks for the heads-up @hbredin! 🙌 What's the recommended way to extract a Pythonic version from the diarization output (e.g. as a list or dict)?
I'd use Annotation.itertracks.
This will break after pyannote.core 5.x branch is released.
@hbredin @sanchit-gandhi 7 months later, this is broken. Any insight to fix this?
I've submitted PR addressing this issue
Why did they remove the JSON serialization in pyannote.core
5.x?
Hi,
1) I am following a tutorial on huggingface and ran into this 🙆♂️🤦♂️ 2) Attempts to use older versions of dependencies simply doesn't work in 2024
Any workaround here?
Following step results into error due to missing for_json
pipeline(sample["audio"].copy())
Below is the exception stacktrace:
{
"name": "AttributeError",
"message": "'Annotation' object has no attribute 'for_json'",
"stack": "---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[51], line 1
----> 1 pipeline(sample[\"audio\"].copy())
File ~/.venv/lib/python3.10/site-packages/speechbox/diarize.py:90, in ASRDiarizationPipeline.__call__(self, inputs, group_by_speaker, **kwargs)
83 inputs, diarizer_inputs = self.preprocess(inputs)
85 diarization = self.diarization_pipeline(
86 {\"waveform\": diarizer_inputs, \"sample_rate\": self.sampling_rate},
87 **kwargs,
88 )
---> 90 segments = diarization.for_json()[\"content\"]
92 # diarizer output may contain consecutive segments from the same speaker (e.g. {(0 -> 1, speaker_1), (1 -> 1.5, speaker_1), ...})
93 # we combine these segments to give overall timestamps for each speaker's turn (e.g. {(0 -> 1.5, speaker_1), ...})
94 new_segments = []
AttributeError: 'Annotation' object has no attribute 'for_json'"
}
PS: Tutorial also uses for_json
in previous steps. I was able to replace that using outputs.itertracks
, but this last step usage is within speechbox
library itself on huggingface and i don't see any newer release of speechbox
in past 6-months.
Hi,
- I am following a tutorial on huggingface and ran into this 🙆♂️🤦♂️
- Attempts to use older versions of dependencies simply doesn't work in 2024
Any workaround here?
Following step results into error due to missing
for_json
pipeline(sample["audio"].copy())
Below is the exception stacktrace:
{ "name": "AttributeError", "message": "'Annotation' object has no attribute 'for_json'", "stack": "--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[51], line 1 ----> 1 pipeline(sample[\"audio\"].copy()) File ~/.venv/lib/python3.10/site-packages/speechbox/diarize.py:90, in ASRDiarizationPipeline.__call__(self, inputs, group_by_speaker, **kwargs) 83 inputs, diarizer_inputs = self.preprocess(inputs) 85 diarization = self.diarization_pipeline( 86 {\"waveform\": diarizer_inputs, \"sample_rate\": self.sampling_rate}, 87 **kwargs, 88 ) ---> 90 segments = diarization.for_json()[\"content\"] 92 # diarizer output may contain consecutive segments from the same speaker (e.g. {(0 -> 1, speaker_1), (1 -> 1.5, speaker_1), ...}) 93 # we combine these segments to give overall timestamps for each speaker's turn (e.g. {(0 -> 1.5, speaker_1), ...}) 94 new_segments = [] AttributeError: 'Annotation' object has no attribute 'for_json'" }
PS: Tutorial also uses
for_json
in previous steps. I was able to replace that usingoutputs.itertracks
, but this last step usage is withinspeechbox
library itself on huggingface and i don't see any newer release ofspeechbox
in past 6-months.
https://github.com/huggingface/speechbox/pull/26/files as per this, installing speechbox
from git repo seems to do the job (also as recommended by tutorial, I ignored that and installed from pypi
🤦♂️)
Good to close.
https://github.com/huggingface/speechbox/blob/82a07082576ecbe9d3f2522e5909aede3b37abd4/src/speechbox/diarize.py#L90
This will break after pyannote.core 5.x branch is released. Source: https://github.com/pyannote/pyannote-core/commit/c89a9cfff103e3aa96c8518c4193a43e493c4790