pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
5.88k stars 752 forks source link

Difference between develop branch and PyPi #1424

Closed mabergerx closed 5 months ago

mabergerx commented 1 year ago

I am trying to use pyannote.audio in an air-gapped environment with its own PyPi proxy, such that I can perform a pip install pyannote.audio.

However, using this method I encounter all kinds of problems that I can't reproduce on a Google Colab using pip install -qq https://github.com/pyannote/pyannote-audio/archive/refs/heads/develop.zip. In this scenario, both running the voice activity pipeline using HuggingFace loader and using the local model work fine.

So these are the scenarios that work:

Google Colab 1) Install through pip install -qq https://github.com/pyannote/pyannote-audio/archive/refs/heads/develop.zip. 2) Then, instantiate the pipeline and run it

from pyannote.audio import Pipeline
from pyannote.audio import Inference
import numpy as np

from IPython.display import Audio

audio_filepath = "/content/my_file.mp3"

# Login with HF credentials, etc, and then...
pipeline = Pipeline.from_pretrained("pyannote/voice-activity-detection")

initial_params = {"onset": 0.9, "offset": 0.8,
                  "min_duration_on": 0.05537587440407595, "min_duration_off": 0.09791355693027545}
pipeline.instantiate(initial_params)

# This works
audio_timeline = pipeline(audio_filepath).get_timeline()

...and also this scenario:

# Upload the model file and config.yaml to Colab, and then...

from pyannote.audio import Pipeline
from pyannote.audio import Inference
import numpy as np

from IPython.display import Audio

audio_filepath = "/content/my_file.mp3"

offline_vad = Pipeline.from_pretrained("/content/config.yaml")

initial_params = {"onset": 0.9, "offset": 0.8,
                  "min_duration_on": 0.05537587440407595, "min_duration_off": 0.09791355693027545}
offline_vad.instantiate(initial_params)

# This also works
audio_timeline = offline_vad(audio_filepath)

Now here are the scenarios that don't work:

My air-gapped environment 1) Install through pip install pyannote.audio 2) Upload the segmentation model files used in the example above to the environment and trying to load them:

from pyannote.audio import Pipeline
from pyannote.audio import Inference

import numpy as np

from IPython.display import Audio

audio_filepath = "my_file.mp3"

offline_vad = Pipeline.from_pretrained("config.yaml")

initial_params = {"onset": 0.9, "offset": 0.8,
                  "min_duration_on": 0.05537587440407595, "min_duration_off": 0.09791355693027545}
offline_vad.instantiate(initial_params)

# Loading the pipeline works, it loads without errors and returns an instance of a VAD pipeline, but then...

audio_timeline = offline_vad("my_file.mp3")
>>> RuntimeError: Error opening 'my_file.mp3': Error : bad map offset.

Google Colab 1) Install through pip install pyannote.audio 2) Upload the segmentation model files used in the example above to the Colab and trying to load them:

# This instantly gives an error
from pyannote.audio import Pipeline
>>> OSError: /usr/local/lib/python3.10/dist-packages/torchtext/lib/libtorchtext.so: undefined symbol: _ZN2at4_ops10select_int4callERKNS_6TensorElN3c106SymIntE

In summary, when installing the library from the develop branch on Google Colab, I can use both HF initialization and local files to produce the desired results. However, when using the PyPi version, I get errors on both my own environment and on Colab, even though the errors are different, while the data stays the same. Seeing as I can't easily install the develop branch on my environment from GitHub, I wonder why the behaviour I see occurs.

Thanks.

github-actions[bot] commented 1 year ago

Thank you for your issue.You might want to check the FAQ if you haven't done so already.

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

We also offer paid scientific consulting services around speaker diarization (and speech processing in general).

This is an automated reply, generated by FAQtory

triinity2221 commented 1 year ago

Hi @mabergerx, I have the same issue were you able to resolve it

stale[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.