gvne / spleeterpp

A C++ Inference library for the Spleeter project
MIT License
162 stars 33 forks source link

Allow working on spectrograms #1

Closed gvne closed 4 years ago

gvne commented 4 years ago

In spleeter, we can easily add or change the saved model output to get the spectrogram.
This change in spleeter adds the [instrument]_spectrogram:

diff --git a/spleeter/model/__init__.py b/spleeter/model/__init__.py
index 384e838..abb403f 100644
--- a/spleeter/model/__init__.py
+++ b/spleeter/model/__init__.py
@@ -302,8 +302,10 @@ class EstimatorSpecBuilder(object):
             instrument_mask = instrument_mask[
                 :tf.shape(stft_feature)[0], ...]
             # Compute masked STFT and normalize it.
+            output_waveform[f'{instrument}_stft'] = \
+                tf.cast(instrument_mask, dtype=tf.complex64) * stft_feature
             output_waveform[instrument] = self._inverse_stft(
-                tf.cast(instrument_mask, dtype=tf.complex64) * stft_feature)
+                output_waveform[f'{instrument}_stft'])
         return output_waveform

     def _build_output_waveform(self, output_dict):

Output of saved_model_cliis now

...
  The given SavedModel SignatureDef contains the following output(s):
    outputs['accompaniment'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 2)
        name: strided_slice_23:0
    outputs['accompaniment_stft'] tensor_info:
        dtype: DT_COMPLEX64
        shape: (-1, 2049, 2)
        name: mul_4:0
    outputs['audio_id'] tensor_info:
        dtype: DT_STRING
        shape: unknown_rank
        name: Placeholder_1:0
    outputs['vocals'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 2)
        name: strided_slice_13:0
    outputs['vocals_stft'] tensor_info:
        dtype: DT_COMPLEX64
        shape: (-1, 2049, 2)
        name: mul_1:0
  Method name is: tensorflow/serving/predict

We should use the same principle to export models that work on spectrograms instead of waveforms

gvne commented 4 years ago

The best way may be to edit the model_fn function.
This snippet gives an expoted model with an stft as input and two masks as output

import os
import json
import tempfile
import shutil

import numpy as np

import tensorflow as tf
from tensorflow.contrib import predictor, signal

from spleeter.model import get_model_function
from spleeter.utils.tensor import pad_and_partition, pad_and_reshape

WINDOW_COMPENSATION_FACTOR = 2./3.
EPSILON = 1e-10

# -----------------
def model_fn(features, labels, mode, params, config):
    if mode != tf.estimator.ModeKeys.PREDICT:
        raise Exception("This script only supports prediction")
    # self._build_stft_feature()
    stft_feature = features['stft']
    mix_spectrogram = \
        tf.abs(pad_and_partition(stft_feature, params['T']))[:, :, :params['F'], :]

    # output_dict = self._build_output_dict()
    try:
        model_type = params['model']['type']
    except KeyError:
        model_type = 'unet.unet'  # default model

    output_dict = get_model_function(model_type)(
        mix_spectrogram,
        params['instrument_list'],
        params['model']['params']
    )

    # output_waveform = self._build_output_waveform(output_dict)
    separation_exponent = params['separation_exponent']
    output_sum = tf.reduce_sum(
        [e ** separation_exponent for e in output_dict.values()],
        axis=0
    ) + EPSILON
    output_mask = {}
    for instrument in params['instrument_list']:
        output = output_dict[instrument + '_spectrogram']
        # compute mask with the model
        instrument_mask = (
            output ** separation_exponent
            + (EPSILON / len(output_dict))) / output_sum

        output_mask[instrument] = instrument_mask

    return tf.estimator.EstimatorSpec(
        tf.estimator.ModeKeys.PREDICT,
        predictions=output_mask)
#------------------

extraction_type = "2stems"
temp_directory = "exported_stft_model"  # tempfile.mkdtemp()

# read the json parameters
with open(os.path.join("spleeter/resources", extraction_type + ".json")) as parameter_file:
    parameters = json.load(parameter_file)
parameters['MWF'] = False  # default parameter

# create the estimator
estimator = tf.estimator.Estimator(
    model_fn=model_fn,
    model_dir=os.path.join("pretrained_models", extraction_type),
    params=parameters,
    config=tf.estimator.RunConfig(session_config=tf.compat.v1.ConfigProto())
)

def receiver():
    features = {
        'stft': tf.compat.v1.placeholder(
            tf.complex64, shape=(None, parameters['frame_length'] / 2 + 1, parameters['n_channels']))
    }
    return tf.estimator.export.ServingInputReceiver(features, features)

# export the estimator into a temp directory
estimator.export_saved_model(temp_directory, receiver)
gvne commented 4 years ago

Very first implementation is available in branch issue/1

gvne commented 4 years ago

Opened an issue on spleeter which explained a lot of things !

I need to provide the 'T' parameter in the export script and provide an extra overlap parameter that defines how many fft frames I want to overlap on each T long window.

gvne commented 4 years ago

A test was developed on branch issue/1 with a proper implementation of the on line filter working on the spectrogram. A bug with tensorflow prevents from compiling in release but #10 should fix this. Even in debug mode, we can easily work in realtime.

gvne commented 4 years ago

Updated the filter to use the tensorflow C API instead of tensorflow_cc. The lib should feature both the filter way AND the classic file based process. We need to adapt the build system

gvne commented 4 years ago

Integrated in develop branch.
Splited the root lib into spleeter and spleeter_common to add a spleeter_filter library. the filter lib gives an online interface to the project