Closed bml1g12 closed 6 years ago
Looks like this is due to a breaking change in latest version of pyannote.generators
.
Can you try with pyannote.generators <= 0.17.1
and let me know if that solves your issue?
Thanks for the swift feedback. I downgraded to 0.17.1 and get:
pyannote-speech-detection train ${EXPERIMENT_DIR} AMI.SpeakerDiarization.MixHeadset
Using TensorFlow backend.
Traceback (most recent call last):
File "/home/bml1g12/anaconda3/envs/pyannote/bin/pyannote-speech-detection", line 7, in <module>
from pyannote.audio.applications.speech_detection import main
File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/pyannote/audio/applications/speech_detection.py", line 147, in <module>
from pyannote.audio.generators.speech import \
File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/pyannote/audio/generators/speech.py", line 33, in <module>
from pyannote.databse.util import get_annotated
ModuleNotFoundError: No module named 'pyannote.databse'
Fixing what I think is a typo of 'databse' to 'database' gives a different error:
Using TensorFlow backend.
Traceback (most recent call last):
File "/home/bml1g12/anaconda3/envs/pyannote/bin/pyannote-speech-detection", line 11, in <module>
sys.exit(main())
File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/pyannote/audio/applications/speech_detection.py", line 591, in main
application.train(protocol_name, subset=subset)
File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/pyannote/audio/applications/speech_detection.py", line 308, in train
optimizer=SSMORMS3(), log_dir=train_dir)
File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/pyannote/audio/labeling/base.py", line 126, in fit
verbose=1, callbacks=callbacks)
File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/keras/engine/training.py", line 2080, in fit_generator
self._make_train_function()
File "/home/bml1g12/anaconda3/envs/pyannote/lib/python3.6/site-packages/keras/engine/training.py", line 992, in _make_train_function
loss=self.total_loss)
TypeError: get_updates() missing 1 required positional argument: 'constraints'
This is likely due to a breaking change in Keras.
Hmmmm. Do you mind trying the develop
branch of pyannote-audio
?
I believe all these bugs were fixed in there...
OK I'm now using the develop branch. It looks like it now requires torch, so I downloaded torch (pip install torch) and got torch-0.4.0. I get the following error:
(pyannote-devel) bml1g12@Batfred-PC:/mnt/e/AMI_corpus/speech-activity-detection$ pyannote-speech-detection train ${EXPERIMENT_DIR} AMI.SpeakerDiarization.MixHeadset
Using TensorFlow backend.
Traceback (most recent call last):
File "/home/bml1g12/anaconda3/envs/pyannote-devel/bin/pyannote-speech-detection", line 11, in <module>
load_entry_point('pyannote.audio', 'console_scripts', 'pyannote-speech-detection')()
File "/home/bml1g12/anaconda3/envs/pyannote-devel/lib/python3.6/site-packages/pkg_resources/__init__.py", line 480, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/home/bml1g12/anaconda3/envs/pyannote-devel/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2691, in load_entry_point
return ep.load()
File "/home/bml1g12/anaconda3/envs/pyannote-devel/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2322, in load
return self.resolve()
File "/home/bml1g12/anaconda3/envs/pyannote-devel/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2328, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/home/bml1g12/pyannote-audio/pyannote/audio/applications/speech_detection.py", line 180, in <module>
from pyannote.audio.labeling.base import SequenceLabeling
File "/home/bml1g12/pyannote-audio/pyannote/audio/labeling/base.py", line 41, in <module>
import torch.nn as nn
File "/home/bml1g12/anaconda3/envs/pyannote-devel/lib/python3.6/site-packages/torch/__init__.py", line 78, in <module>
from torch._C import *
ImportError: dlopen: cannot load any more object with static TLS
This apparently is a known issue with workarounds to do with shuffling import orders, but I guess you aren't experiencing this problem so I wonder if you may be using a different version of torch?
Hmmmm. Actually, only the speaker embedding part really uses pytorch. All other modules (speech activity detection, speaker change detection) still rely on Keras.
I guess, getting rid of the "import torch" stuff should do the trick (at least for speech activity detection and speaker change detection).
FYI, I am currently rewriting the whole library to only use pytorch (in a private repo for now). I hope to have it ready by Interspeech 2018 but cannot really promise an ETA...
Sorry for the slow reply, I've been testing a few different versions of pyannote and pip/conda packages to see what I can get working. Thanks for creating such a wonderful set of tools, I look forward to the next version. In the mean time:
Using the develop branch, and installing pytorch on a new linux machine such that ("import torch.nn as nn" no longer produces an error), I seem to be able to train the model just fine, but when it comes to the tuning step I find that the tuning is outputting only zeros (and using only the CPU), producing this sort of output:
pyannote-speech-detection tune ${TRAIN_DIR} AMI.SpeakerDiarization.MixHeadset
Iteration No: 1 ended. Evaluation done at provided point.
Time taken: 1026.3768
Function value obtained: 0.0000
Current minimum: 0.0000
And subsequent test results from pyannote-metrics show very low accuracy. Do you have any idea what could be going wrong?
I managed to get the Speaker Embedding functionality working (and with GPU support) using a specific combination of pip/conda libraries. I've included a pip file and conda file in order to allow other users to reproduce this environment, in order to use them simply follow these steps;
1. conda create -n new_env --file env-conda.txt # installs pip, as
specified in the env-conda.txt file
2. source activate new_env # now pip is in your path
3. pip install -r env-pip.txt
env-conda.txt env-pip.txt I also manually needed to install some missing libraries
I just started working again on change detection on AMI and came to the same conclusion.
You might want to change the --purity
contraint to a lower value when tuning because the default value (0.9 = 90% purity) is probably a bit too high for AMI (for which speech turns boundaries manual annotations are not precise enough).
As far as using only CPU, this is somehow expected as tuning only consists in trying several peak detection thresholds and then using pyannote.metrics to evaluate. GPU is only used once for each epoch to extract raw speaker change detection scores. Future version should (at least) use all CPU cores (i.e. process one file per core).
Thanks, that's useful to know, but the tuning issue I was referring to (with Function value obtained: 0) was actually for the pyannote-speech-detection tune
command which doesn't have a purity argument option, so maybe its not possible to use pyannote-speech-detection on AMI at the moment?
By the way, in this paper I noticed they found some errors with the AMI labelling - particularly the timings of the speech turn boundaries. They did a forced alignment to try to fix these errors, and share the resulting files here on that github too. Not sure if I should use them, as its tough to know whether their forced alignment could be generating more errors than its solving!
I just released version 1.0 of pyannote.audio -- which is an almost complete rewrite. Feel free to re-open this issue in case it remains unsolved in v1.0.
Ubuntu 14.04.5 LTS
I'm following the tutorial here: https://github.com/pyannote/pyannote-audio/tree/master/tutorials/speech-activity-detection
And have followed the installation steps:
But using the AMI database and using Python 3.6 because the database module requires f-strings from Python 3.6.
My ~/.pyannote/config.yml reads
AMI: /path_to_ami_corpus/amicorpus/*/audio/{uri}.wav
Which I hope is the correct way of allowing it pyannote.database to scan the amicorpus. It seemed to work for the
pyannote-speech-feature
command.When I run :
pyannote-speech-detection train ${EXPERIMENT_DIR} AMI.SpeakerDiarization.MixHeadset
I get the followings traceback:
The same error applied to pyannote-change-detection
And would be very grateful for any ideas on what the solution could be.
I'm using the same config.yml in EXPERIMENT_DIR as the tutorial. I notice that it doesn't include the number of MFCCs - so does it assume you've already calculated them using
pyannote-speech-feature
? If so, how do I specify the path of the extracted MFCCs?Thanks for any help