Looking to Listen

This is implementation of "Looking to Listen at the Cocktail Party" by python3 and chainer. This deep learning technology can be applied to noise reduction, removal of background music, and speech separation.

Original paper is here (arxiv.org/abs/1804.03619). Note that this implementation is inspired by crystal-method (MIT).

Quick Start Demonstration (Audio-only Noise Reduction)

We show demonstration of noise reduction using pretrained model.

First, you need build docker container.
```
$ docker-compose build
```
Put the noisy audio file(s) to ./data/noise.
Run following command.

GPU

$ docker-compose run network python3 quick_start_audio_only.py /data/model/0f_1sclean_noise.npz /data/noise

CPU (comment out _set_gpu() in network/src/env.py)

Intel CPU (Fast)
$ docker-compose run network python3 quick_start_audio_only.py /data/model/0f_1sclean_noise.npz /data/noise -ideep
Other CPU (Slow)
$ docker-compose run network python3 quick_start_audio_only.py /data/model/0f_1sclean_noise.npz /data/noise

We can get clean audio in ./data/results.

Usage

Please refer to the following section for additional information such as speech separation and audio-visual processing.

Open in bash

$ docker-compose run preprocess bash

$ docker-compose run dataset bash

$ docker-compose run network bash

Differences from original paper

The original paper has a large FC layer. However, there is not enough memory to put this network on the GPU. In this implementation, the size of the FC layer is reduced so that a network can be installed in a single GPU.

External Libraries

We use external libraries in preprocess/src/libs.

Facenet (MIT)

meokz / looking-to-listen

readme