Add Layers to the Model for Longer Training Times

tsurumeso / vocal-remover

Vocal Remover using Deep Neural Networks

MIT License

1.55k stars 222 forks source link

Add Layers to the Model for Longer Training Times #71

Open Anjok07 opened 3 years ago

Anjok07 commented 3 years ago

Hello!

Would it be possible for you to make an update to the AI that adds layers to the model? It seems that it hit's a limit after about 4 days of training (the validation loss/training loss stagnates after about 4 days) (this is on a 650 pair dataset). I want to be able to train over the course of a longer time period to achieve better training/validation losses before a model reaches it's limit. Would this be possible?

Thank you in advance for your help!

404000 commented 3 years ago

Anjok07 Did the sound quality become better in version 4? Is version 4 better than version 2.2?

aufr33 commented 3 years ago

This is a really important question. We've managed to achieve incredible results in our beta fork: https://github.com/Anjok07/ultimatevocalremovergui/tree/v5-beta-cml. However, the current size of the model does not allow us to move on.

Anjok07 commented 3 years ago

We were able to increase the model size by doubling the channel size via the nets.py like so -

class CascadedASPPNet(nn.Module):

    def __init__(self, n_fft):
        super(CascadedASPPNet, self).__init__()
        self.stg1_low_band_net = BaseASPPNet(2, 32)
        self.stg1_high_band_net = BaseASPPNet(2, 32)

        self.stg2_bridge = layers.Conv2DBNActiv(34, 16, 1, 1, 0)
        self.stg2_full_band_net = BaseASPPNet(16, 32)

        self.stg3_bridge = layers.Conv2DBNActiv(66, 32, 1, 1, 0)
        self.stg3_full_band_net = BaseASPPNet(32, 64)

        self.out = nn.Conv2d(64, 2, 1, bias=False)
        self.aux1_out = nn.Conv2d(32, 2, 1, bias=False)
        self.aux2_out = nn.Conv2d(32, 2, 1, bias=False)

        self.max_bin = n_fft // 2
        self.output_bin = n_fft // 2 + 1

        self.offset = 128

We're training model with these settings now. What do you think of this appraoch?

tsurumeso commented 3 years ago

I think that's a good approach, but it increases GPU memory consumption significantly. Memory-efficient approaches are as follows:

Add an additional convolutional layer to Decoder.
Increase the filter size of some layers of BaseASPPNet (e.g. ksize=5, pad=2).

(Sorry, no guarantee for better accuracy)