Memory footprint in Google Colab

ghost commented 3 years ago

Thank you so much for this great model ! Wondeful job ! I have just a little question about the memory required for the separation. The model seem use a lot of memory and require to split the audio of a full song (> 1min / 1min30) in Google Colab (free version because no pro version for european users) and resample the audio to from hires (96000Hz) to lowres (44100Hz).

The current jupiter notebook show only process on very short samples (youtube video), I've slightly modify the code to allow using audio from Google Drive but seem to be limited to low resolution / short duration audio file without using splitting/merging audio subprocess. The same limitation of RAM footprint was resolved with Spleeter (Deezer) by a similar method but with some constraints (zero padding to remove in audio) (issue here : https://github.com/deezer/spleeter/issues/391#issuecomment-652202433).

Is someone already do the job?

ws-choi commented 3 years ago

Hi MaxC2, thanks for the feedback. As you mentioned, you have to resample the input file into 44100Hz audio file. I'll add some code lines for auto resampling later.

, but you don't have to manually split and merge audio sub-process. When you call the separate_track function of a pretrained model like

separated = model.separate_track(track.audio, 'vocals')

It automatically splits the given track into several sub-audio (each sub-audio has the same number of samples, and the last sub-audio is zero-padded), separates source for each sub-audio, and merges all the separated outputs to make a final audio file.

Below is the code for this.

 def separate_track(self, input_signal, target) -> torch.Tensor:

        import numpy as np

        self.eval()
        with torch.no_grad():
                db = SingleTrackSet(input_signal, self.hop_length, self.num_frame)
                assert target in db.source_names
                separated = []

                input_condition = np.array(db.source_names.index(target))
                input_condition = torch.tensor(input_condition, dtype=torch.long, device=self.device).view(1)

                for item in db:
                    separated.append(self.separate(item.unsqueeze(0).to(self.device), input_condition)[0]
                                     [self.trim_length:-self.trim_length].detach().cpu().numpy())

        separated = np.concatenate(separated, axis=0)

        import soundfile
        soundfile.write('temp.wav', separated, 44100)
        return soundfile.read('temp.wav')[0]

The pytorch dataset API SingleTrackSet automatically splits the given track, in an on-the-fly manner.

After iterating every sub-audio file, separate_track merges all outputs by separated = np.concatenate(separated, axis=0)

Thank you.

ghost commented 3 years ago

Yes, it's maybe because i've attempt to use to load 96KHz audio with librosa (sr=96000) before calling separate_track and get a kick out from Google Colab out of RAM. I have retry with a 44.1KHz cutted at ~1min30. So now, I will test will a full song resampled at the right sample rate. Thank you very much for your support, and once again well done for your great model !

ghost commented 3 years ago

OK, I've done some test. The problem come from the use of embedded audio player display(Audio(audio, rate=rate)) that seem duplicate the audio in some manner and use a lot of RAM. So for a big audio file (ex. more 10 minutes) you're always kicked out from Google Colab out of RAM limits.

To do the trick, the idea is not to use the embbed audio preview and directly call the separate_track process.

In a draft form, for using my audio stored in Google Drive, i've write two new cells. The first one is the common Google Drive mount:

    from google.colab import drive
    drive.mount('/content/gdrive', force_remount=True)

The second load any audio file, resample (best filter quality) and convert to stereo if needed. Each processed temp.wav are renamed and writed in a destination subfolder in Google Drive (separatedfor my case) in order to facilitate the download (zip file).

    import os
    import shutil
    import librosa
    import resampy

    gcolab_root = '/content/Conditioned-Source-Separation-LaSAFT/'
    gdrive_root = '/content/gdrive/My Drive/'

    destination_folder = 'separated'

    default_sample_rate = 44100
    sources = ['vocals', 'drums', 'bass', 'other']

    def load_audio(audio_path):
      audio, rate = librosa.load(audio_path, sr=None, mono=False)
      if rate != default_sample_rate:
        audio = resampy.resample(audio, rate, default_sample_rate, filter='kaiser_best')
      is_mono = audio.ndim == 1
      if is_mono:
        audio = np.asfortranarray(np.array([audio, audio]))
      return audio, rate, is_mono

    def separate_all_sources(audio, gdrive_path):
      for src in sources:
        print("separate '%s'" %src)
        model.separate_track(audio.T, src)
        shutil.copy(os.path.join(gcolab_root, 'temp.wav'), 
                    os.path.join(gdrive_path, src + '.wav'))

    # prepare google drive destination folder
    path = os.path.join(gdrive_root, destination_folder)
    try:
      os.makedirs(path, exist_ok = True)
    except OSError as error:
      print("Directory '%s' can not be created" %path)

    print('load audio source')
    audio_file = os.path.join(gdrive_root, 'audio/stairway/center.flac')
    audio, rate, is_mono = load_audio(audio_file)
    separate_all_sources(audio, path)
    print('finished')

I need to add a fallback to audio format:

back to mono
back to original sample rate

But for the moment this process work great on my audio files (>10mn, 96kbits 24 bits) without offline preprocessing.

Maybe the idea will be to add a extra method in the python code that do not write the test.wav in the root project folder, but a named .wav (vocals / drums / bass / other) in a project temporary subfolder (separated for example). And do a zip over the folder with a download link in Google Colab after the separation.

That can help some potential users that do not have a Google Drive account.

For me, it's OK and fine. Thank you very much.

ws-choi commented 3 years ago

Thank you for sharing your experience. I'll update the code to reflect what you've recommended, sooner or later 👍

ws-choi / Conditioned-Source-Separation-LaSAFT

Memory footprint in Google Colab #2