Cannot use CRNN with Theano backend

franciscoaraposo commented 6 years ago

Hi,

Do you know if the pre-trained CRNN model works for any combination of library versions? I've tried tensorflow, but I have seen another issue here saying that it is impossible to load tensorflow weights. I'm using Keras 1.2.0 and Theano 0.9.0 and I'm getting the following error output during model.predict():

ValueError: GpuReshape: trying to reshape an array of total size 1440 into an array of total size 96. Apply node that caused the error: GpuReshape{4}(bn_0_freq_running_mean, TensorConstant{[ 1 1 96 1]}) Toposort index: 39 Inputs types: [GpuArrayType<None>(float32, (False,)), TensorType(int64, vector)] Inputs shapes: [(1440,), (4,)] Inputs strides: [(4,), (8,)] Inputs values: ['not shown', array([ 1, 1, 96, 1])] Outputs clients: [[GpuElemwise{sub,no_inplace}(GpuIncSubtensor{Set;::, ::, int64:int64:, int64:int64:}.0, GpuReshape{4}.0)]]

keunwoochoi commented 6 years ago

Could you also confirm the image_dim_ordering? It should be ‘th’ IIRC.

On 2Oct 2017, at 12:47, franciscoaraposo notifications@github.com wrote:

Hi,

Do you know if the pre-trained CRNN model works for any combination of library versions? I've tried tensorflow, but I have seen another issue here saying that it is impossible to load tensorflow weights. I'm using Keras 1.2.0 and Theano 0.9.0 and I'm getting the following error during model.predict():

`ValueError: GpuElemwise. Input dimension mis-match. Input 2 (indices start at 0) has shape[2] == 1440, but the output's size on that axis is 96. Apply node that caused the error: GpuElemwise{Composite{(((i0 - i1) i2 i3) + i4)}}[](GpuIncSubtensor{InplaceSet;::, ::, int64:int64:, int64:int64:}.0, GpuElemwise{Composite{(((i0 / i1) / i2) / i3)}}[(0, 0)].0, InplaceGpuDimShuffle{x,x,0,x}.0, GpuElemwise{Composite{inv(sqrt(((((i0 / i1) / i2) / i3) + i4)))}}[(0, 0)].0, InplaceGpuDimShuffle{x,x,0,x}.0) Toposort index: 116 Inputs types: [GpuArrayType(float32, (False, False, False, False)), GpuArrayType(float32, (True, True, False, True)), GpuArrayType(float32, (True, True, False, True)), GpuArrayType(float32, (True, True, False, True)), GpuArrayType(float32, (True, True, False, True))] Inputs shapes: [(1, 1, 96, 1440), (1, 1, 96, 1), (1, 1, 1440, 1), (1, 1, 96, 1), (1, 1, 1440, 1)] Inputs strides: [(552960, 552960, 5760, 4), (384, 384, 4, 4), (5760, 5760, 4, 4), (384, 384, 4, 4), (5760, 5760, 4, 4)] Inputs values: ['not shown', 'not shown', 'not shown', 'not shown', 'not shown'] Outputs clients: [[if{inplace,gpu}(keras_learning_phase, GpuElemwise{Composite{(((i0 - i1) i2 i3) + i4)}}[].0, GpuElemwise{Composite{(((i0 - i1) * i2) + i3)}}[(0, 0)].0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.`

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/keunwoochoi/music-auto_tagging-keras/issues/33, or mute the thread https://github.com/notifications/unsubscribe-auth/APZ8xZEQQjBA85sD6qJoiHkdCYRE_0p5ks5soM1YgaJpZM4Pqhzm.

franciscoaraposo commented 6 years ago

yes, image_dim_ordering is correct

keunwoochoi commented 6 years ago

Seems like it's about a shape mismatch. Have you checked out if you could run the examples?

franciscoaraposo commented 6 years ago

My example is even simpler:

`import music_tagger_crnn

model = music_tagger_crnn.MusicTaggerCRNN(include_top=False)

import audio_processor

melgram = audio_processor.compute_melgram('test.wav')

model.predict(melgram)`

keunwoochoi commented 6 years ago

Yes but it omits few. Keras model expects a batch input of which the shape is 4d array, (batch_axis, w, h, ch) in our case. Ref: https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/example_tagging.py#L68

franciscoaraposo commented 6 years ago

not the problem, your compute_melgram() functions deals with it. furthermore, I just ran the features example and get the same error

franciscoaraposo commented 6 years ago

by the way, that script has a bug: it's "model.predict" not "model.prediun"... have you even ever run that example?

keunwoochoi commented 6 years ago

by the way, that script has a bug: it's "model.predict" not "model.prediun"... have you even ever run that example?

I actually just found it and fixed it. Don't understand why it's like that...

franciscoaraposo commented 6 years ago

example_tagging gives the same error

franciscoaraposo commented 6 years ago

i've been trying many differennt combinations of library versions (including libgpuarray) and still no luck...

keunwoochoi commented 6 years ago

Alright.. haven't really looked at the CNN/CRNN in the main directory for a while. Could you have a look on the compact_cnn folder? And what's it's for? tagging? feature extraction?

franciscoaraposo commented 6 years ago

We (our research group) already looked at that one and were able to get it to work but now we want to compare it with the CRNN version. since the old CNN is not that different from the compact_cnn, we don't care about that one as much, but we really want to try out the CRNN pre-trained model...

franciscoaraposo commented 6 years ago

i guess this may be a library version issue. if you get it to work, please tell me which versions of every important library you used. i'm using anaconda so it's not hard to switch between them

keunwoochoi commented 6 years ago

Ok now trying to get a working requirements.txt. Will let you soon.

On 2Oct 2017, at 14:24, franciscoaraposo notifications@github.com wrote:

i guess this may be a library version issue. if you get it to work, please tell me which versions of every important library you used. i'm using anaconda so it's not hard to switch between them

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/keunwoochoi/music-auto_tagging-keras/issues/33#issuecomment-333533055, or mute the thread https://github.com/notifications/unsubscribe-auth/APZ8xYdiJhbUMV7kdQQBq38JP78aej9mks5soOQFgaJpZM4Pqhzm.

keunwoochoi commented 6 years ago

Embarrassing that I couldn't find a solution. It should be theano0.8.2.

To get the CRNN work, please do $ git checkout a399872 with keras 1.1. Because it's not keras 1.0.6, the CNN would run but not properly works. CRNN does, I just checked. Sorry for such a mess.

keunwoochoi commented 6 years ago

Not really maintaining this repo but probably I need separate example files for cnn/crnn, revert some of the code/weights to a399872 or whatever. Can't find the time at the moment now though.

franciscoaraposo commented 6 years ago

No worries, version management can be a mess sometimes, especially with such fast developing libraries. Thank you for finding me a working version. I will check tomorrow and reply here to confirm.

franciscoaraposo commented 6 years ago

Hi, I installed both of those versions, i.e., Keras 1.1.0 and Theano 0.8.2 (by the way, this version does not support libgpuarray, so I had to use nvcc) and checked out that specific repos version and still get a similar error:

ValueError: GpuReshape: cannot reshape input of shape (1440) to shape (1, 1, 96, 1). Apply node that caused the error: GpuReshape{4}(bn_0_freq_gamma, MakeVector{dtype='int64'}.0) Toposort index: 63 Inputs types: [CudaNdarrayType(float32, vector), TensorType(int64, vector)] Inputs shapes: [(1440,), (4,)] Inputs strides: [(1,), (8,)] Inputs values: ['not shown', array([ 1, 1, 96, 1])] Outputs clients: [[GpuElemwise{Composite{(i0 / sqrt(clip((i1 + i2), i3, i4)))}}[(0, 2)](GpuReshape{4}.0, CudaNdarrayConstant{[[[[ 9.99999975e-06]]]]}, GpuReshape{4}.0, CudaNdarrayConstant{[[[[ 0.]]]]}, CudaNdarrayConstant{[[[[ inf]]]]})]]

keunwoochoi commented 6 years ago

Here's my screen. What else do you want me to check out?

(venv_k110)keunwoo@weaver4[music-auto_tagging-keras]$ python
Python 2.7.5 (default, Nov  6 2016, 00:28:07)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import theano
thea>>> theano.__version__
'0.8.2'
>>> import keras
Using Theano backend.
keras.>>> keras.__version__
'1.1.0'
>>> keras.backend.image_dim_ordering()
'th'
(venv_k110)keunwoo@weaver4[music-auto_tagging-keras]$ git branch
* (detached from a399872)
  master
(venv_k110)keunwoo@weaver4[music-auto_tagging-keras]$ python example_tagging.py
Using Theano backend.
Running main() with network: cnn and backend: theano
Predicting...
Prediction is done. It took 6 seconds.
Printing top-10 tags for each track...
data/bensound-cute.mp3
[('rock', '0.000'), ('pop', '0.000'), ('alternative', '0.000'), ('indie', '0.000'), ('electronic', '0.000')]
[('female vocalists', '0.000'), ('dance', '0.000'), ('00s', '0.000'), ('alternative rock', '0.000'), ('jazz', '0.000')]

data/bensound-actionable.mp3
[('rock', '0.000'), ('pop', '0.000'), ('alternative', '0.000'), ('indie', '0.000'), ('electronic', '0.000')]
[('female vocalists', '0.000'), ('dance', '0.000'), ('00s', '0.000'), ('alternative rock', '0.000'), ('jazz', '0.000')]

data/bensound-dubstep.mp3
[('rock', '0.000'), ('pop', '0.000'), ('alternative', '0.000'), ('indie', '0.000'), ('electronic', '0.000')]
[('female vocalists', '0.000'), ('dance', '0.000'), ('00s', '0.000'), ('alternative rock', '0.000'), ('jazz', '0.000')]

data/bensound-thejazzpiano.mp3
[('jazz', '1.000'), ('rock', '0.000'), ('pop', '0.000'), ('alternative', '0.000'), ('indie', '0.000')]
[('electronic', '0.000'), ('female vocalists', '0.000'), ('dance', '0.000'), ('00s', '0.000'), ('alternative rock', '0.000')]

Running main() with network: crnn and backend: theano
Predicting...
pPrediction is done. It took 10 seconds.
Printing top-10 tags for each track...
data/bensound-cute.mp3
[('jazz', '0.511'), ('instrumental', '0.163'), ('guitar', '0.068'), ('ambient', '0.067'), ('electronic', '0.057')]
[('experimental', '0.043'), ('blues', '0.040'), ('chillout', '0.037'), ('folk', '0.036'), ('rock', '0.030')]

data/bensound-actionable.mp3
[('jazz', '0.365'), ('instrumental', '0.215'), ('guitar', '0.092'), ('rock', '0.084'), ('blues', '0.075')]
[('experimental', '0.058'), ('Hip-Hop', '0.056'), ('folk', '0.054'), ('electronic', '0.052'), ('Progressive rock', '0.051')]

data/bensound-dubstep.mp3
[('Hip-Hop', '0.160'), ('electronic', '0.127'), ('rock', '0.115'), ('alternative', '0.061'), ('instrumental', '0.051')]
[('jazz', '0.048'), ('electronica', '0.042'), ('experimental', '0.042'), ('metal', '0.033'), ('alternative rock', '0.030')]

data/bensound-thejazzpiano.mp3
[('electronic', '0.199'), ('jazz', '0.173'), ('instrumental', '0.120'), ('ambient', '0.098'), ('chillout', '0.092')]
[('electronica', '0.060'), ('rock', '0.040'), ('chill', '0.036'), ('experimental', '0.034'), ('funk', '0.031')]

franciscoaraposo commented 6 years ago

Running the same example, now with CPU instead of GPU, still crashes the script when predicting with CRNN, but provides more detailed information. Maybe this can help:

Running main() with network: crnn and backend: theano
Predicting...
Traceback (most recent call last):
  File "example_tagging.py", line 91, in <module>
    main(net)
  File "example_tagging.py", line 74, in main
    pred_tags = model.predict(melgrams)
  File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1179, in predict
    batch_size=batch_size, verbose=verbose)
  File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 878, in _predict_loop
    batch_outs = f(ins_batch)
  File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 717, in __call__
    return self.function(*inputs)
  File "/home/francisco/anaconda2/lib/python2.7/site-packages/theano/compile/function_module.py", line 871, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/home/francisco/anaconda2/lib/python2.7/site-packages/theano/gof/link.py", line 314, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/home/francisco/anaconda2/lib/python2.7/site-packages/theano/compile/function_module.py", line 859, in __call__
    outputs = self.fn()
ValueError: cannot reshape array of size 1440 into shape (1,1,96,1)
Apply node that caused the error: Reshape{4}(bn_0_freq_gamma, MakeVector{dtype='int64'}.0)
Toposort index: 57
Inputs types: [TensorType(float32, vector), TensorType(int64, vector)]
Inputs shapes: [(1440,), (4,)]
Inputs strides: [(4,), (8,)]
Inputs values: ['not shown', array([ 1,  1, 96,  1])]
Outputs clients: [[Elemwise{Composite{(i0 / sqrt(clip((i1 + i2), i3, i4)))}}[(0, 2)](Reshape{4}.0, TensorConstant{(1, 1, 1, ..) of 1e-05}, Reshape{4}.0, TensorConstant{(1, 1, 1, 1) of 0.0}, TensorConstant{(1, 1, 1, 1) of inf})]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "example_tagging.py", line 91, in <module>
    main(net)
  File "example_tagging.py", line 69, in main
    model = MusicTaggerCRNN(weights='msd')
  File "/home/francisco/workspace/music-auto_tagging-keras/music_tagger_crnn.py", line 90, in MusicTaggerCRNN
    x = BatchNormalization(axis=freq_axis, name='bn_0_freq')(x)
  File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 514, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 572, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 149, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/layers/normalization.py", line 138, in call
    epsilon=self.epsilon)
  File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 412, in normalize_batch_in_training
    broadcast_gamma = T.reshape(gamma, target_shape)

franciscoaraposo commented 6 years ago

Are you using anaconda or virtualenv? If so, can you please specify every command you do since you create the conda/virtualenv environment? I've even tried different Python versions and still no luck... Also, it's better if you test it using the CPU so we don't have any cuda or CuDNN differences...

keunwoochoi commented 6 years ago

Confirmed again. With virtualenv on linux.

virtualenv venvtemp
source venvtemp/bin/activate
pip install pip --upgrade
pip install keras==1.1.0 h5py theano==0.8.2 librosa

and

git checkout a399872

did the job.

franciscoaraposo commented 6 years ago

ok, finally found out the issue. when I did checkout it did not replace the CRNN model, since I slightly changed that file. I guess the problem with the most recent version of CRNN is that you're doing batch normalization along the frequency axis instead of doing it along the time axis (which is what is being done in a399872). However, you want to normalize in frequency axis instead of time, right? If so, the real solution would be to normalize along frequency axis BUT using mode=1 instead of mode=0, since you want to do it on a song by song basis. So my question now is, what version of the software was used to train the weights for CRNN? My guess is that no version was ever doing axis=freq and mode=1 so the weights are from a model that was wrongly implemented...

keunwoochoi commented 6 years ago

Okay, so trying with my example should work, right? Have you tried?

When I was doing BN along time axis, I did it by mistake. Although it turns out that it doesn't really matter much. BN with mode=1 had some keras-related bug back then, so I had to use mode=0 as far as I recall.

So, with respect to what I intended (which is freq-axis BN), a399872 is wrong. But it works well.

franciscoaraposo commented 6 years ago

Hi, Yes it works, in the sense that it does not crash. But, in that same sense, axis=freq and mode=1 also works (at least during prediction). However, since I want to use the pre-trained model, I should use axis=time and mode=0 since I guess that was how the code was at the time you trained the model. Is that correct?

TakeatEasy commented 4 years ago

I have use Theano (0.8.2)&Keras (1.1.0) and got the same error on example_tagging , what should i do to make it work?

thuyduong991234 commented 3 years ago

I have use Theano (0.8.2)&Keras (1.1.0) and got the same error on example_tagging , what should i do to make it work?

You should do:

git checkout a399872
Fix some error that can occur (google has answers)

keunwoochoi / music-auto_tagging-keras

Cannot use CRNN with Theano backend #33