pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.42k stars 790 forks source link

Speech detection/change detection error: _init__() missing 1 required positional argument #96

Closed bml1g12 closed 6 years ago

bml1g12 commented 6 years ago

Ubuntu 14.04.5 LTS

I'm following the tutorial here: https://github.com/pyannote/pyannote-audio/tree/master/tutorials/speech-activity-detection

And have followed the installation steps:

$ conda install gcc
$ conda install -c conda-forge yaafe
$ pip install "pyannote.audio==0.2.1"

But using the AMI database and using Python 3.6 because the database module requires f-strings from Python 3.6.

My ~/.pyannote/config.yml reads AMI: /path_to_ami_corpus/amicorpus/*/audio/{uri}.wav

Which I hope is the correct way of allowing it pyannote.database to scan the amicorpus. It seemed to work for the pyannote-speech-feature command.

When I run : pyannote-speech-detection train ${EXPERIMENT_DIR} AMI.SpeakerDiarization.MixHeadset

I get the followings traceback:

Traceback (most recent call last):
  File "/home/bml1g12/anaconda3/envs/py35-pyannote-audio/bin/pyannote-speech-detection", line 11, in <module>
    sys.exit(main())
  File "/home/bml1g12/anaconda3/envs/py35-pyannote-audio/lib/python3.5/site-packages/pyannote/audio/applications/speech_detection.py", line 522, in main
    application.train(protocol_name, subset=subset)
  File "/home/bml1g12/anaconda3/envs/py35-pyannote-audio/lib/python3.5/site-packages/pyannote/audio/applications/speech_detection.py", line 275, in train
    batch_size=batch_size)
  File "/home/bml1g12/anaconda3/envs/py35-pyannote-audio/lib/python3.5/site-packages/pyannote/audio/generators/speech.py", line 50, in __init__
    super().__init__(segment_generator, batch_size=batch_size)
TypeError: __init__() missing 1 required positional argument: 'signature'

The same error applied to pyannote-change-detection

And would be very grateful for any ideas on what the solution could be.

I'm using the same config.yml in EXPERIMENT_DIR as the tutorial. I notice that it doesn't include the number of MFCCs - so does it assume you've already calculated them using pyannote-speech-feature? If so, how do I specify the path of the extracted MFCCs?

Thanks for any help

hbredin commented 6 years ago

Looks like this is due to a breaking change in latest version of pyannote.generators.

Can you try with pyannote.generators <= 0.17.1 and let me know if that solves your issue?

bml1g12 commented 6 years ago

Thanks for the swift feedback. I downgraded to 0.17.1 and get:

pyannote-speech-detection train ${EXPERIMENT_DIR} AMI.SpeakerDiarization.MixHeadset
Using TensorFlow backend.
Traceback (most recent call last):
  File "/home/bml1g12/anaconda3/envs/pyannote/bin/pyannote-speech-detection", line 7, in <module>
    from pyannote.audio.applications.speech_detection import main
  File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/pyannote/audio/applications/speech_detection.py", line 147, in <module>
    from pyannote.audio.generators.speech import \
  File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/pyannote/audio/generators/speech.py", line 33, in <module>
    from pyannote.databse.util import get_annotated
ModuleNotFoundError: No module named 'pyannote.databse'

Fixing what I think is a typo of 'databse' to 'database' gives a different error:

Using TensorFlow backend.
Traceback (most recent call last):
  File "/home/bml1g12/anaconda3/envs/pyannote/bin/pyannote-speech-detection", line 11, in <module>
    sys.exit(main())
  File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/pyannote/audio/applications/speech_detection.py", line 591, in main
    application.train(protocol_name, subset=subset)
  File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/pyannote/audio/applications/speech_detection.py", line 308, in train
    optimizer=SSMORMS3(), log_dir=train_dir)
  File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/pyannote/audio/labeling/base.py", line 126, in fit
    verbose=1, callbacks=callbacks)
  File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/keras/engine/training.py", line 2080, in fit_generator
    self._make_train_function()
  File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/keras/engine/training.py", line 992, in _make_train_function
    loss=self.total_loss)
TypeError: get_updates() missing 1 required positional argument: 'constraints'
hbredin commented 6 years ago

This is likely due to a breaking change in Keras.

Hmmmm. Do you mind trying the develop branch of pyannote-audio? I believe all these bugs were fixed in there...

bml1g12 commented 6 years ago

OK I'm now using the develop branch. It looks like it now requires torch, so I downloaded torch (pip install torch) and got torch-0.4.0. I get the following error:

(pyannote-devel) bml1g12@Batfred-PC:/mnt/e/AMI_corpus/speech-activity-detection$ pyannote-speech-detection train ${EXPERIMENT_DIR} AMI.SpeakerDiarization.MixHeadset
Using TensorFlow backend.
Traceback (most recent call last):
  File "/home/bml1g12/anaconda3/envs/pyannote-devel/bin/pyannote-speech-detection", line 11, in <module>
    load_entry_point('pyannote.audio', 'console_scripts', 'pyannote-speech-detection')()
  File "/home/bml1g12/anaconda3/envs/pyannote-devel/lib/python3.6/site-packages/pkg_resources/__init__.py", line 480, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/home/bml1g12/anaconda3/envs/pyannote-devel/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2691, in load_entry_point
    return ep.load()
  File "/home/bml1g12/anaconda3/envs/pyannote-devel/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2322, in load
    return self.resolve()
  File "/home/bml1g12/anaconda3/envs/pyannote-devel/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2328, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/bml1g12/pyannote-audio/pyannote/audio/applications/speech_detection.py", line 180, in <module>
    from pyannote.audio.labeling.base import SequenceLabeling
  File "/home/bml1g12/pyannote-audio/pyannote/audio/labeling/base.py", line 41, in <module>
    import torch.nn as nn
  File "/home/bml1g12/anaconda3/envs/pyannote-devel/lib/python3.6/site-packages/torch/__init__.py", line 78, in <module>
    from torch._C import *
ImportError: dlopen: cannot load any more object with static TLS

This apparently is a known issue with workarounds to do with shuffling import orders, but I guess you aren't experiencing this problem so I wonder if you may be using a different version of torch?

hbredin commented 6 years ago

Hmmmm. Actually, only the speaker embedding part really uses pytorch. All other modules (speech activity detection, speaker change detection) still rely on Keras.

I guess, getting rid of the "import torch" stuff should do the trick (at least for speech activity detection and speaker change detection).

FYI, I am currently rewriting the whole library to only use pytorch (in a private repo for now). I hope to have it ready by Interspeech 2018 but cannot really promise an ETA...

bml1g12 commented 6 years ago

Sorry for the slow reply, I've been testing a few different versions of pyannote and pip/conda packages to see what I can get working. Thanks for creating such a wonderful set of tools, I look forward to the next version. In the mean time:

Using the develop branch, and installing pytorch on a new linux machine such that ("import torch.nn as nn" no longer produces an error), I seem to be able to train the model just fine, but when it comes to the tuning step I find that the tuning is outputting only zeros (and using only the CPU), producing this sort of output:

pyannote-speech-detection tune ${TRAIN_DIR} AMI.SpeakerDiarization.MixHeadset

Iteration No: 1 ended. Evaluation done at provided point.
Time taken: 1026.3768
Function value obtained: 0.0000
Current minimum: 0.0000

tuning_error.txt

And subsequent test results from pyannote-metrics show very low accuracy. Do you have any idea what could be going wrong?


I managed to get the Speaker Embedding functionality working (and with GPU support) using a specific combination of pip/conda libraries. I've included a pip file and conda file in order to allow other users to reproduce this environment, in order to use them simply follow these steps;

1. conda create -n new_env --file env-conda.txt  # installs pip, as
specified in the env-conda.txt file
2. source activate new_env  # now pip is in your path
3. pip install -r env-pip.txt

env-conda.txt env-pip.txt I also manually needed to install some missing libraries


hbredin commented 6 years ago

I just started working again on change detection on AMI and came to the same conclusion. You might want to change the --purity contraint to a lower value when tuning because the default value (0.9 = 90% purity) is probably a bit too high for AMI (for which speech turns boundaries manual annotations are not precise enough).

As far as using only CPU, this is somehow expected as tuning only consists in trying several peak detection thresholds and then using pyannote.metrics to evaluate. GPU is only used once for each epoch to extract raw speaker change detection scores. Future version should (at least) use all CPU cores (i.e. process one file per core).

bml1g12 commented 6 years ago

Thanks, that's useful to know, but the tuning issue I was referring to (with Function value obtained: 0) was actually for the pyannote-speech-detection tune command which doesn't have a purity argument option, so maybe its not possible to use pyannote-speech-detection on AMI at the moment?

By the way, in this paper I noticed they found some errors with the AMI labelling - particularly the timings of the speech turn boundaries. They did a forced alignment to try to fix these errors, and share the resulting files here on that github too. Not sure if I should use them, as its tough to know whether their forced alignment could be generating more errors than its solving!

hbredin commented 6 years ago

I just released version 1.0 of pyannote.audio -- which is an almost complete rewrite. Feel free to re-open this issue in case it remains unsolved in v1.0.