khaotik / DaNet-Tensorflow

Tensorflow implementation of "Speaker-independent Speech Separation with Deep Attractor Network"
MIT License
89 stars 41 forks source link

DaNet-Tensorflow

Tensorflow implementation of "Speaker-Independent Speech Separation with Deep Attractor Network"

Link to original paper

2021 Note: I am NOT the original author of paper. This code runs but won't learn well. I've got no time to work on this. If you managed to get the models working, let me know.

STILL WORK IN PROGRESS, EXPECT BUGS

Requirements

numpy / scipy

tensorflow >= 1.2

matplotlib (optional, for visualization)

h5py / fuel (optional, for certain datasets)

Usage

Prepare datasets

Currently, TIMIT and WSJ0 datasets are implemented. You can use the "toy" dataset for debugging. It just some white noise.

Follow app/datasets/TIMIT/readme for dataset preparation.

Follow app/datasets/WSJ0/readme for dataset preparation.

After setting up a dataset, you may want to change DATASET_TYPE in hyperparameters.

Setup hyperparameters

This is to change batch size, learning rate, dataset type etc ...

There's a default.json file at the root directory. You make your own and change some of the values. For example you can create a JSON file with:

{
    DATASET_TYPE="timit",
    LR=1e-2,
    BATCH_SIZE=8
}

Save it as my_setup.json, now you can run the script with:

python main.py -c my_setup.json

Some commonly used hyperparameters can be overridden by CLI args.

For example, to set learning rate:

python main.py -lr=1e-2

Here's a incomplete list of them:

# set learning rate, overrides LR
-lr
--learn-rate

# set dataset to use, overrides DATASET_TYPE
-ds
--dataset

# set batch size, overrides 
-bs
--batch-size

# set

Note If you get out of memory (OOM) error from tensorflow, you can try using a lower BATCH_SIZE.

Note If you change FFT_SIZE, FFT_STRIDE, FFT_WND, SMP_RATE, you should do dataset preprocessing again.

Note If you change model architecture, the previously saved model parameter may not be compatible.

Perform experiments

Under the root directory of this repo:

    python main.py -ds='timit'
    python main.py -c my_setup.json
    python main.py -ne=100 -o='params.ckpt'
    python main.py -ne=100 -i='params.ckpt' -o='params.ckpt'
    python main.py -i='params.ckpt' -m=test
    $ python main.py -i='params.ckpt' -m=demo
    $ ls *.wav
    demo.wav demo_separated_1.wav demo_separated_2.wav
    $ python main.py -i='params.cpkt' -m=demo -if=file.wav
    $ ls *.wav
    file.wav file_separated_1.wav file_separated_2.wav
    tensorboard --logdir=./logs/`
    python main.py --help

Use custom dataset

    @hparams.register_dataset('my_dataset')
    class MyDataset(Dataset):
        ...

You can use app/datasets/timit.py as an reference.

    import app.datasets.my_dataset

Customize model

You can make subclass of Estimator, Encoder, or Separator to tweak model.

You can set encoder type by setting ENCODER_TYPE in hyperparameters.

You can set estimator type by setting TRAIN_ESTIMATOR_METHOD and INFER_ESTIMATOR_METHOD in hyperparameters.

You can set separator type by setting SEPARATOR_TYPE in hyperparameters.

Make sure to use @register_* decorator for your class. See code in app/modules.py for details. There are existing sub-modules.

To change overall model architecture, modify Model.build() in main.py

Limitations