keunwoochoi / music-auto_tagging-keras

Music auto-tagging models and trained weights in keras/theano
MIT License
616 stars 142 forks source link

Input file data shape of .npy file in compact_cnn #23

Closed exeex closed 7 years ago

exeex commented 7 years ago

Is the test file , 1100103.clip.npy , generated by the same way of other .npy files through audio_processor.py ?

When I use bensound-thejazzpiano.npy instead of 1100103.clip.npy main.py throw out a data shape mismatch msg: ValueError: Error when checking : expected melspectrogram_input_1 to have 3 dimensions, but got array with shape (1, 1, 1, 1, 96, 1366)

How to modify the audio_processor.py to fix this error?

Thank you again.

keunwoochoi commented 7 years ago

No.

I just added a notebook.

thisisandreeeee commented 7 years ago

I followed the instructions in the notebook to create a numpy array of shape (16, 1, 44100), but encountered the following error.

ValueError: Error when checking : expected melspectrogram_input_6 to have shape (None, 1, 348000) but got array with shape (16, 1, 44100)

Could you advise on what I can do to proceed? I basically have a bunch of MP3 files that I wish to generate features for. Any help would be deeply appreciated!

keunwoochoi commented 7 years ago

Hi, written as in the error message, it expects 348000 samples - 12000 Hz x 29-second. Sorry for the confusion, in the notebook I set it for 1-second just as an example. What structure are you using? If you wanna use pre-trained weights too, please check out https://github.com/keunwoochoi/music-auto_tagging-keras/tree/master/compact_cnn or https://github.com/keunwoochoi/transfer_learning_music, too!

thisisandreeeee commented 7 years ago

Thanks for the quick response @keunwoochoi, and for your patience on this matter. I'm relatively unfamiliar with manipulating MP3s as inputs for a neural network. I hope to use the compact_cnn for feature extraction, and all my MP3 files are 30 seconds long. The following describes the code involved:

def convert_mp3(file_loc):
  '''For reading the MP3 file and converting to np array'''
  src, sr = librosa.load(file_loc, sr=None, mono=True)
  len_seconds = 30.
  src = src[:int(sr*len_seconds)]
  src = src[np.newaxis, :]
  return src

def extract_features(models, src):
  '''Takes an array of model objects of structure [main('feature'), main('feature', 3) ...] and generates predictions for the src array'''
  feat = [md.predict(src)[0] for md in models]
  feat = np.array(feat).reshape(-1)
  return feat

Everything else works fine, I'm just having difficulty converting my 30 second MP3 file into an array of shape that the model accepts.

keunwoochoi commented 7 years ago

This line specifies 29.0 second as an input signal, with a sampling rate of 12000. So...

src, sr = librosa.load(file_loc, sr=12000, mono=True)
# now src: (N, ) and sr: 12000.
len_seconds = 29.
src = src[:int(sr*len_seconds)]
src = src[np.newaxis, :]
# the src might be shorter than 29*12000 if the original signal is shorter than that.

This would do the work.

thisisandreeeee commented 7 years ago

Thanks for your help @keunwoochoi, it works really well now!

keunwoochoi commented 7 years ago

Great!