ubclaunchpad / minutes

:telescope: Speaker diarization via transfer learning
https://medium.com/ubc-launch-pad-software-engineering-blog/speaker-diarisation-using-transfer-learning-47ca1a1226f4
27 stars 5 forks source link

Add Prediction Output #131

Open chadlagore opened 6 years ago

chadlagore commented 6 years ago

:construction_worker: Changes

:flashlight: Testing Instructions

pytest -vvv

Follow README instructions.

iKevinY commented 6 years ago

Hmm, for some reason the PR build is passing but the actual commit build is failing? 🤔

chadlagore commented 6 years ago

Yeah. The failure means that the model build is not deterministic, which may not be a problem, but affects they way the tests may be written. I can run that test 100 times and view the output to be sure.

chadlagore commented 6 years ago

Posting a fix for this today!

chadlagore commented 6 years ago
~/git/minutes chad/126-prediction-output*
minutes-c2ZPuskd ❯ pytest test/test_minutes.py
=============================================================================================== test session starts ===============================================================================================
platform darwin -- Python 3.6.4, pytest-3.5.1, py-1.5.3, pluggy-0.6.0
rootdir: /Users/chadlagore/git/minutes, inifile:
plugins: cov-2.5.1
collected 3 items

test/test_minutes.py ..F                                                                                                                                                                                    [100%]

==================================================================================================== FAILURES =====================================================================================================
__________________________________________________________________________________________________ test_phrases ___________________________________________________________________________________________________

    def test_phrases():
        for model_name in Minutes.parents:
            minutes = Minutes(parent=model_name)
            minutes.add_speaker(c.SPEAKER1)
            minutes.add_speaker(c.SPEAKER2)
            minutes.fit()

            # Predict new phrases (make sure we ony predict once per obs)
            conversation = Conversation(c.CONVERSATION_AUDIO, minutes)
            raw, _ = conversation.get_observations(**minutes.preprocessing_params)
            assert len(conversation.phrases) == len(raw)
            print(conversation.phrases)

            # Make sure we ony predicted on speaker 1 and 2.
            names = [p.speaker.name for p in conversation.phrases]
>           assert sorted(list(np.unique(names))) == ['speaker1', 'speaker2']
E           AssertionError: assert ['speaker2'] == ['speaker1', 'speaker2']
E             At index 0 diff: 'speaker2' != 'speaker1'
E             Right contains more items, first extra item: 'speaker2'
E             Use -v to get the full diff

test/test_minutes.py:37: AssertionError
---------------------------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------------------------
[<minutes.conversation.Phrase object at 0x1201b7278>, <minutes.conversation.Phrase object at 0x1201b71d0>, <minutes.conversation.Phrase object at 0x1201b7978>, <minutes.conversation.Phrase object at 0x1201b7a58>, <minutes.conversation.Phrase object at 0x1201b7240>, <minutes.conversation.Phrase object at 0x1201b7940>, <minutes.conversation.Phrase object at 0x1201b7780>, <minutes.conversation.Phrase object at 0x1201b79e8>, <minutes.conversation.Phrase object at 0x1201b7a20>, <minutes.conversation.Phrase object at 0x1201b7898>]
======================================================================================= 1 failed, 2 passed in 7.35 seconds ========================================================================================

~/git/minutes chad/126-prediction-output* 10s
minutes-c2ZPuskd ❯ pytest test/test_minutes.py
=============================================================================================== test session starts ===============================================================================================
platform darwin -- Python 3.6.4, pytest-3.5.1, py-1.5.3, pluggy-0.6.0
rootdir: /Users/chadlagore/git/minutes, inifile:
plugins: cov-2.5.1
collected 3 items

test/test_minutes.py ...                                                                                                                                                                                    [100%]

============================================================================================ 3 passed in 5.44 seconds =============================================================================================

I've replicated this locally. It is as we expected, the test is non-deterministic because we train the model each time!

Problem

We provide our model with a random_state parameter—this gets used by _generate_training_data to do a test-train split. Ideally, it would lock the Keras model as well, but its not that simple. Keras uses the Numpy seed and the Tensorflow seed, which is set globally.

from numpy.random import seed
from tensorflow import set_random_seed
seed(random_state)
set_random_seed(random_state)

Options

  1. Admit that minutes cannot guarantee reproducibility, use the random_state to generate training data, then let the user set the global Numpy state if they want. This will cause confusion if someone sets the random state of the Minutes model, and expects some sort of stability in the answer.
  2. Save the Numpy state, call the Keras functions, restore the Numpy state with a decorator. This makes fitting Minutes models non-thread safe 🤔.
chadlagore commented 6 years ago
  1. Inform the user that they should set the np.random.seed and the tf.set_random_seed if they want reproducibility.

This seems like the simplest solution for now!