keunwoochoi / music-auto_tagging-keras

Music auto-tagging models and trained weights in keras/theano
MIT License
616 stars 142 forks source link

Input dimension mismatch #16

Closed mv00147 closed 7 years ago

mv00147 commented 7 years ago

Hello, I managed to debug the audio read related errors and now when I try to run the code (example_tagging.py), I get the following error ValueError: Input dimension mismatch (input[0].shape[2]=96, input[1].shape[2]=1366) Is there something you can suggest to correct this? Warm regards,#Mahalakshmi

drscotthawley commented 7 years ago

I confirm that this error appears when running using both the Tensorflow backend (even if one sets "image_dim_ordering": "th" as instructed) and the Theano backend.
Interestingly, for the two backends, the code 'dies' at different places.

With the Theano backend, the error is...

$ python example_tagging.py Running main() with network: cnn and backend: theano Predicting... Traceback (most recent call last): File "example_tagging.py", line 87, in main(net) File "example_tagging.py", line 70, in main pred_tags = model.predict(melgrams) File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1268, in predict batch_size=batch_size, verbose=verbose) File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 946, in _predict_loop batch_outs = f(ins_batch) File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 959, in call return self.function(inputs) File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 898, in call storage_map=getattr(self.fn, 'storage_map', None)) File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 325, in raise_with_op reraise(exc_type, exc_value, exc_trace) File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 884, in call self.fn() if output_subset is None else\ ValueError: GpuElemwise. Input dimension mis-match. Input 2 (indices start at 0) has shape[2] == 1366, but the output's size on that axis is 96. Apply node that caused the error: GpuElemwise{Composite{(((i0 - i1) i2 i3) + i4)}}[](GpuFromHost.0, GpuElemwise{Composite{(((i0 / i1) / i2) / i3)}}[(0, 0)].0, InplaceGpuDimShuffle{x,x,0,x}.0, GpuElemwise{Composite{inv(sqrt(((((i0 / i1) / i2) / i3) + i4)))}}[(0, 0)].0, InplaceGpuDimShuffle{x,x,0,x}.0) Toposort index: 240 Inputs types: [GpuArrayType(float32, (False, False, False, False)), GpuArrayType(float32, (True, True, False, True)), GpuArrayType(float32, (True, True, False, True)), GpuArrayType(float32, (True, True, False, True)), GpuArrayType(float32, (True, True, False, True))] Inputs shapes: [(4, 1, 96, 1366), (1, 1, 96, 1), (1, 1, 1366, 1), (1, 1, 96, 1), (1, 1, 1366, 1)] Inputs strides: [(524544, 524544, 5464, 4), (384, 384, 4, 4), (5464, 5464, 4, 4), (384, 384, 4, 4), (5464, 5464, 4, 4)] Inputs values: ['not shown', 'not shown', 'not shown', 'not shown', 'not shown'] Outputs clients: [[if{inplace,gpu}(keras_learning_phase, GpuElemwise{Composite{(((i0 - i1) i2 i3) + i4)}}[].0, GpuElemwise{Composite{(((i0 - i1) i2) + i3)}}[(0, 0)].0)]] $

Whereas with the Tensorflow backend the error is...

$ python example_tagging.py Using TensorFlow backend. I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally Running main() with network: cnn and backend: tensorflow Traceback (most recent call last): File "example_tagging.py", line 87, in main(net) File "example_tagging.py", line 63, in main model = MusicTaggerCNN(weights='msd') File "/home/myusername/music-auto_tagging-keras/music_tagger_cnn.py", line 137, in MusicTaggerCNN by_name=True) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2706, in load_weights self.load_weights_from_hdf5_group_by_name(f) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2843, in load_weights_from_hdf5_group_by_name K.batch_set_value(weight_value_tuples) File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1833, in batch_set_value assign_op = x.assign(assign_placeholder) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 505, in assign return state_ops.assign(self._variable, value, use_locking=use_locking) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 45, in assign use_locking=use_locking, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 749, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2382, in create_op set_shapes_for_outputs(ret) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1783, in set_shapes_for_outputs shapes = shape_func(op) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 596, in call_cpp_shape_fn raise ValueError(err.message) ValueError: Dimension 0 in both shapes must be equal, but are 96 and 1366 $

keunwoochoi commented 7 years ago

Ooops? I'll take a look. Meanwhile, I'd suggest to take a look on compact_cnn. It can be better for transfer learning.

mv00147 commented 7 years ago

Dear sir, I have managed to get the code running. Just wanted to know a little more about the output. What does it mean when every time I run it, the tags assigned to an mp3 file are different? Or is there some fault in the way I am running it. For ex: The first time I ran the code, I got the following tags for data/bensound-cute.mp3 [('House', '1.000'), ('happy', '1.000'), ('Hip-Hop', '0.998'), ('heavy metal', '0.998'), ('experimental', '0.997')] [('classic rock', '0.996'), ('Mellow', '0.994'), ('electronica', '0.989'), ('80s', '0.982'), ('sad', '0.975')]

The 2nd time, these are the tags. data/bensound-cute.mp3 [('00s', '0.999'), ('electro', '0.992'), ('alternative', '0.992'), ('female vocalist', '0.989'), ('90s', '0.983')] [('hard rock', '0.973'), ('rock', '0.969'), ('instrumental', '0.969'), ('70s', '0.948'), ('ambient', '0.904')]

WHy do the tags change?

keunwoochoi commented 7 years ago

What structure are you using? The only reason I can think of is different batch size.

keunwoochoi commented 7 years ago

Could you also elaborate how did you managed to get it run? Thanks!

mv00147 commented 7 years ago

Hi, I ran it step by step and removed a few lines as follows:

from keras.layers import Input, Dense
from keras.models import Model
from keras.layers import Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D, ZeroPadding2D
from keras.layers.normalization import BatchNormalization
from keras.layers.advanced_activations import ELU
from keras.utils.data_utils import get_file
from keras.layers import Input, Dense
import time
import numpy as np
from keras import backend as K
import audio_processor as ap
import pdb

##
def sort_result(tags, preds):
    result = zip(tags, preds)
    sorted_result = sorted(result, key=lambda x: x[1], reverse=True)
    return [(name, '%5.3f' % score) for name, score in sorted_result]

def librosa_exists():
    try:
        __import__('librosa')
    except ImportError:
        return False
    else:
        return True

    audio_paths = ['data/bensound-cute.mp3',
                   'data/bensound-actionable.mp3',
                   'data/bensound-dubstep.mp3',
                   'data/bensound-thejazzpiano.mp3']
    melgram_paths = ['data/bensound-cute.npy',
                     'data/bensound-actionable.npy',
                     'data/bensound-dubstep.npy',
                     'data/bensound-thejazzpiano.npy']

    tags = ['rock', 'pop', 'alternative', 'indie', 'electronic',
            'female vocalists', 'dance', '00s', 'alternative rock', 'jazz',
            'beautiful', 'metal', 'chillout', 'male vocalists',
            'classic rock', 'soul', 'indie rock', 'Mellow', 'electronica',
            '80s', 'folk', '90s', 'chill', 'instrumental', 'punk',
            'oldies', 'blues', 'hard rock', 'ambient', 'acoustic',
            'experimental', 'female vocalist', 'guitar', 'Hip-Hop',
            '70s', 'party', 'country', 'easy listening',
            'sexy', 'catchy', 'funk', 'electro', 'heavy metal',
            'Progressive rock', '60s', 'rnb', 'indie pop',
            'sad', 'House', 'happy']

    # prepare data like this
    melgrams = np.zeros((0, 1, 96, 1366))

    if librosa_exists:
        for audio_path in audio_paths:
            melgram = ap.compute_melgram(audio_path)
            melgrams = np.concatenate((melgrams, melgram), axis=0)
    else:
        for melgram_path in melgram_paths:
            melgram = np.load(melgram_path)
            melgrams = np.concatenate((melgrams, melgram), axis=0)

    TH_WEIGHTS_PATH = 'https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/data/music_tagger_cnn_weights_theano.h5'
    weights='msd'
    input_tensor=None
    include_top=True

    if weights not in {'msd', None}:
        raise ValueError('The `weights` argument should be either '
                         '`None` (random initialization) or `msd` '
                         '(pre-training on Million Song Dataset).')
    if K.image_dim_ordering() == 'th':
        input_shape = (1, 96, 1366)
    else:
        input_shape = (96, 1366, 1)

    if input_tensor is None:
        melgram_input = Input(shape=input_shape)
    else:
        if not K.is_keras_tensor(input_tensor):
            melgram_input = Input(tensor=input_tensor, shape=input_shape)
        else:
            melgram_input = input_tensor

    if K.image_dim_ordering() == 'th':
        channel_axis = 1
        freq_axis = 2
        time_axis = 3
    else:
        channel_axis = 3
        freq_axis = 1
        time_axis = 2

    x = BatchNormalization(axis=freq_axis, name='bn_0_freq')(melgram_input)

    x = Convolution2D(32, 3, 3, border_mode='same', name='conv1')(x)
    x = BatchNormalization(axis=channel_axis, mode=0, name='bn1')(x)
    x = ELU()(x)
    x = MaxPooling2D(pool_size=(2, 4), name='pool1')(x)

    x = Convolution2D(64, 3, 3, border_mode='same', name='conv2')(x)
    x = BatchNormalization(axis=channel_axis, mode=0, name='bn2')(x)
    x = ELU()(x)
    x = MaxPooling2D(pool_size=(2, 4), name='pool2')(x)

    x = Convolution2D(64, 3, 3, border_mode='same', name='conv3')(x)
    x = BatchNormalization(axis=channel_axis, mode=0, name='bn3')(x)
    x = ELU()(x)
    x = MaxPooling2D(pool_size=(2, 4), name='pool3')(x)

    x = Convolution2D(64, 3, 3, border_mode='same', name='conv4')(x)
    x = BatchNormalization(axis=channel_axis, mode=0, name='bn4')(x)
    x = ELU()(x)
    x = MaxPooling2D(pool_size=(3, 5), name='pool4')(x)

    x = Convolution2D(32, 3, 3, border_mode='same', name='conv5')(x)
    x = BatchNormalization(axis=channel_axis, mode=0, name='bn5')(x)
    x = ELU()(x)
    x = MaxPooling2D(pool_size=(4, 4), name='pool5')(x)

    x = Flatten()(x)
    if include_top:
        x = Dense(50, activation='sigmoid', name='output')(x)
    model = Model(melgram_input, x)
    print (model)

   # if weights is None:
   #   return model    
   # else: 
        # Load input
      #  if K.image_dim_ordering() == 'tf':
         #   raise RuntimeError("Please set image_dim_ordering == 'th'."
            #                   "You can set it at ~/.keras/keras.json")
           # model.load_weights('data/music_tagger_cnn_weights_%s.h5' % K._BACKEND,
                 #          by_name=True)

            # predict the tags like this
    print('Predicting...')
    start = time.time()
    pred_tags = model.predict(melgrams)
    # print like this...
  #  print "Prediction is done. It took %d seconds." % (time.time()-start)
    print('Printing top-10 tags for each track...')

    for song_idx, audio_path in enumerate(audio_paths):
        sorted_result = sort_result(tags, pred_tags[song_idx, :].tolist())
        print(audio_path)
        print(sorted_result[:5])
        print(sorted_result[5:10])
        print(' ')

I have not altered the structure. I am using MusicTaggerCNN.

keunwoochoi commented 7 years ago

I got no idea but what's the version of keras? MusicTaggerCNN is assuming old version of keras and sadly, they are not compatible. Please check out compact_cnn folder, it's newer.

drscotthawley commented 7 years ago

@keunwoochoi Ah, so sorry to bother you then. I was using the latest version of keras. Yes, I've looked at compact_cnn now -- thank you so much for sharing your code!

Like @mv00147, I wrote my own simplified version of the compact_cnn (because parts of it confused me), along with some instructions for training, and got some great results from a dataset of guitar sounds! Posted at https://github.com/drscotthawley/audio-classifier-keras-cnn @keunwoochoi, I strived to give you all the credit; hope it's ok with you, if not let me know and I'll take it down.

keunwoochoi commented 7 years ago

Awesome! I'm happy someone found it useful and working :) Maintaining turned out to be not easy as releasing it, sorry to confuse stuff. The network building code of compact_cnn is not too clear haha. Thanks for sharing it!

mv00147 commented 7 years ago

Hello Scott, Thank you for sharing the information. Would you be able to help me with representative my audio dataset in bumpy array form so I can train my own model. I am struggling with the representation. Warm regards, Mahalakshmi

mv00147 commented 7 years ago

*numpy array

drscotthawley commented 7 years ago

@mv00147 My guess is you should be able to just replace the instances of "librosa.load" with "numpy.load". If that doesn't help, then open an issue on the repo I posted, and I'll try to help you there, so we don't keep lighting up Keunwoo's issue tracker!