Chad/#103 transition minutes from api to library

chadlagore commented 6 years ago

Handles #103 #108 #111

:construction_worker: Changes

Replaced Docker app with a library!
Reused most of old code (except YouTube stuff).
Breaking change: replaced matplotlib spectrograms with scipy.signal.spectrogram which produces a flat (2d spectrogram). This will likely reduce training time at (hopefully) no accuracy cost.
Amazingly, the second of these two spectrogram types performs (much) better 🤔
Added travis build and coveralls token.
@iKevinY added pipenv!!! :tada:

Typical Usage

from minutes import Speaker, Minutes

minutes = Minutes(ms_per_observation=500, model='cnn')

# Create some speakers with some audio.
speaker1 = Speaker('speaker1')
speaker1.add_audio('path/to/audio1.wav')

speaker2 = Speaker('speaker2')
speaker2.add_audio('path/to/audio2.wav')

# Add speakers to the model.
minutes.add_speakers([speaker1, speaker2])

# Fit the model.
minutes.fit()  # Currently breaks (have to refit base model).
result = minutes.predict()

Rebuilding the Base Model With New Speakers

(Bring Your Own GPU)

from minutes import Speaker, Minutes
from minutes.base import BaseModel

model = BaseModel('my_cnn_base', ms_per_observation=500)

# Create some speakers with some (large) audio.
speaker1 = Speaker('speaker1')
speaker1.add_audio('path/to/audio1.wav')

speaker2 = Speaker('speaker2')
speaker2.add_audio('path/to/audio2.wav')

# Add speakers to the model.
minutes.add_speakers([speaker1, speaker2])

# Fit the model.
model.fit()  # Prints validation results....
model.save()

# Use the new base model.
minutes = Minutes(model='my_cnn_base')
# ... add speakers, predict etc.

:flashlight: Testing Instructions

For now,

$ py.test -vvv --cov=minutes test

Lets keep the coverage up!

Repo Layout

.
├── README.md
├── bld.bat
├── build.sh
├── environment.yml
├── meta.yaml
├── minutes
│   ├── __init__.py
│   ├── audio.py
│   ├── base.py
│   ├── conversation.py
│   ├── minutes.py
│   ├── models
│   │   ├── __init__.py
│   │   └── cnn.h5
│   └── speaker.py
├── setup.py
└── test
    ├── __init__.py
    ├── config.py
    ├── fixtures
    │   ├── sample1.wav
    │   └── sample2.wav
    ├── test_audio.py
    ├── test_base.py
    ├── test_minutes.py
    └── test_speaker.py

Base Model(s)

I retrained minutes/models/cnn.5 on flat spectrograms (no channels).
The code to train this is in the BaseModel.fit method.
The model is much simpler than our previous one (ready for optimization!) but still achieves >90% validation accuracy on 3 speakers after 50 epochs.
Note: I had a model without Dropout, it achieved > 95% validation accuracy, but performed far worse in a transfer situation. This makes sense, considering dropout reduces overfitting to the current speakers!

base_model = BaseModel('taco', ms_per_observation=3000)
speaker1 = Speaker('4640')
speaker2 = Speaker('8098')
speaker3 = Speaker('441')

speaker1.add_audio('test/fixtures/4640')
speaker2.add_audio('test/fixtures/8098')
speaker3.add_audio('test/fixtures/441')

base_model.add_speaker(speaker1)
base_model.add_speaker(speaker2)
base_model.add_speaker(speaker3)

base_model.fit(verbose=2)

...
Epoch 45/50
 - 3s - loss: 0.6708 - acc: 0.8345 - val_loss: 0.6273 - val_acc: 0.9076
Epoch 46/50
 - 3s - loss: 0.6404 - acc: 0.8414 - val_loss: 0.6115 - val_acc: 0.9076
Epoch 47/50
 - 3s - loss: 0.6175 - acc: 0.8454 - val_loss: 0.5964 - val_acc: 0.9076
Epoch 48/50
 - 3s - loss: 0.5998 - acc: 0.8553 - val_loss: 0.5815 - val_acc: 0.9076
Epoch 49/50
 - 3s - loss: 0.6101 - acc: 0.8583 - val_loss: 0.5678 - val_acc: 0.9056
Epoch 50/50
 - 3s - loss: 0.5849 - acc: 0.8632 - val_loss: 0.5533 - val_acc: 0.9137

base_model.model.save('minutes/models/cnn.h5')
base_model.model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d_3 (Conv1D)            (None, 49, 32)            219168    
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 24, 32)            0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 24, 32)            0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 768)               0         
_________________________________________________________________
dense_21 (Dense)             (None, 128)               98432     
_________________________________________________________________
dense_22 (Dense)             (None, 3)                 387       
=================================================================
Total params: 317,987
Trainable params: 317,987
Non-trainable params: 0
_________________________________________________________________

Flac-Wav Conversion

Rather crudely, I did something like:

import glob
from subprocess import call

flac_files = glob.glob('test/fixtures/5561' + '/**/*.flac', recursive=True)

for file in flac_files:
    call(["ffmpeg", "-i", file, file.strip('.flac') + '.wav'])

This is obviously not an option for the library. We should find a way to read in .flac files properly (#110).

Other Notes

~3000ms observations gave me the best accuracy.
Transfer accuracy results:

Epoch 12/15
 - 1s - loss: 0.7596 - acc: 0.8963 - val_loss: 0.7163 - val_acc: 0.9102
Epoch 13/15
 - 1s - loss: 0.7092 - acc: 0.8872 - val_loss: 0.6522 - val_acc: 0.9224
Epoch 14/15
 - 1s - loss: 0.6473 - acc: 0.8943 - val_loss: 0.5936 - val_acc: 0.9163
Epoch 15/15
 - 2s - loss: 0.5940 - acc: 0.9023 - val_loss: 0.5443 - val_acc: 0.9286

iKevinY commented 6 years ago

What code is generating the two spectrograms that are in your PR description? I'm curious why so much of the second one is purple (what information is it actually encoding, compared to the first?) 😮

chadlagore commented 6 years ago

https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.spectrogram.html mode="phase" produces the first spectrogram. I agree that the fact that the purple spectrogram learns is rather surprising. #114 will allow users to configure this a bit more.

ubclaunchpad / minutes