Open franciscoaraposo opened 6 years ago
Could you also confirm the image_dim_ordering? It should be ‘th’ IIRC.
On 2Oct 2017, at 12:47, franciscoaraposo notifications@github.com wrote:
Hi,
Do you know if the pre-trained CRNN model works for any combination of library versions? I've tried tensorflow, but I have seen another issue here saying that it is impossible to load tensorflow weights. I'm using Keras 1.2.0 and Theano 0.9.0 and I'm getting the following error during model.predict():
`ValueError: GpuElemwise. Input dimension mis-match. Input 2 (indices start at 0) has shape[2] == 1440, but the output's size on that axis is 96. Apply node that caused the error: GpuElemwise{Composite{(((i0 - i1) i2 i3) + i4)}}[](GpuIncSubtensor{InplaceSet;::, ::, int64:int64:, int64:int64:}.0, GpuElemwise{Composite{(((i0 / i1) / i2) / i3)}}[(0, 0)].0, InplaceGpuDimShuffle{x,x,0,x}.0, GpuElemwise{Composite{inv(sqrt(((((i0 / i1) / i2) / i3) + i4)))}}[(0, 0)].0, InplaceGpuDimShuffle{x,x,0,x}.0) Toposort index: 116 Inputs types: [GpuArrayType(float32, (False, False, False, False)), GpuArrayType(float32, (True, True, False, True)), GpuArrayType(float32, (True, True, False, True)), GpuArrayType(float32, (True, True, False, True)), GpuArrayType(float32, (True, True, False, True))] Inputs shapes: [(1, 1, 96, 1440), (1, 1, 96, 1), (1, 1, 1440, 1), (1, 1, 96, 1), (1, 1, 1440, 1)] Inputs strides: [(552960, 552960, 5760, 4), (384, 384, 4, 4), (5760, 5760, 4, 4), (384, 384, 4, 4), (5760, 5760, 4, 4)] Inputs values: ['not shown', 'not shown', 'not shown', 'not shown', 'not shown'] Outputs clients: [[if{inplace,gpu}(keras_learning_phase, GpuElemwise{Composite{(((i0 - i1) i2 i3) + i4)}}[].0, GpuElemwise{Composite{(((i0 - i1) * i2) + i3)}}[(0, 0)].0)]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.`
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/keunwoochoi/music-auto_tagging-keras/issues/33, or mute the thread https://github.com/notifications/unsubscribe-auth/APZ8xZEQQjBA85sD6qJoiHkdCYRE_0p5ks5soM1YgaJpZM4Pqhzm.
yes, image_dim_ordering is correct
Seems like it's about a shape mismatch. Have you checked out if you could run the examples?
My example is even simpler:
`import music_tagger_crnn
model = music_tagger_crnn.MusicTaggerCRNN(include_top=False)
import audio_processor
melgram = audio_processor.compute_melgram('test.wav')
model.predict(melgram)`
Yes but it omits few. Keras model expects a batch input of which the shape is 4d array, (batch_axis, w, h, ch)
in our case. Ref: https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/example_tagging.py#L68
not the problem, your compute_melgram() functions deals with it. furthermore, I just ran the features example and get the same error
by the way, that script has a bug: it's "model.predict" not "model.prediun"... have you even ever run that example?
by the way, that script has a bug: it's "model.predict" not "model.prediun"... have you even ever run that example?
I actually just found it and fixed it. Don't understand why it's like that...
example_tagging gives the same error
i've been trying many differennt combinations of library versions (including libgpuarray) and still no luck...
Alright.. haven't really looked at the CNN/CRNN in the main directory for a while. Could you have a look on the compact_cnn folder? And what's it's for? tagging? feature extraction?
We (our research group) already looked at that one and were able to get it to work but now we want to compare it with the CRNN version. since the old CNN is not that different from the compact_cnn, we don't care about that one as much, but we really want to try out the CRNN pre-trained model...
i guess this may be a library version issue. if you get it to work, please tell me which versions of every important library you used. i'm using anaconda so it's not hard to switch between them
Ok now trying to get a working requirements.txt. Will let you soon.
On 2Oct 2017, at 14:24, franciscoaraposo notifications@github.com wrote:
i guess this may be a library version issue. if you get it to work, please tell me which versions of every important library you used. i'm using anaconda so it's not hard to switch between them
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/keunwoochoi/music-auto_tagging-keras/issues/33#issuecomment-333533055, or mute the thread https://github.com/notifications/unsubscribe-auth/APZ8xYdiJhbUMV7kdQQBq38JP78aej9mks5soOQFgaJpZM4Pqhzm.
Embarrassing that I couldn't find a solution. It should be theano0.8.2.
To get the CRNN work, please do $ git checkout a399872
with keras 1.1. Because it's not keras 1.0.6, the CNN would run but not properly works. CRNN does, I just checked. Sorry for such a mess.
Not really maintaining this repo but probably I need separate example files for cnn/crnn, revert some of the code/weights to a399872 or whatever. Can't find the time at the moment now though.
No worries, version management can be a mess sometimes, especially with such fast developing libraries. Thank you for finding me a working version. I will check tomorrow and reply here to confirm.
Hi, I installed both of those versions, i.e., Keras 1.1.0 and Theano 0.8.2 (by the way, this version does not support libgpuarray, so I had to use nvcc) and checked out that specific repos version and still get a similar error:
ValueError: GpuReshape: cannot reshape input of shape (1440) to shape (1, 1, 96, 1). Apply node that caused the error: GpuReshape{4}(bn_0_freq_gamma, MakeVector{dtype='int64'}.0) Toposort index: 63 Inputs types: [CudaNdarrayType(float32, vector), TensorType(int64, vector)] Inputs shapes: [(1440,), (4,)] Inputs strides: [(1,), (8,)] Inputs values: ['not shown', array([ 1, 1, 96, 1])] Outputs clients: [[GpuElemwise{Composite{(i0 / sqrt(clip((i1 + i2), i3, i4)))}}[(0, 2)](GpuReshape{4}.0, CudaNdarrayConstant{[[[[ 9.99999975e-06]]]]}, GpuReshape{4}.0, CudaNdarrayConstant{[[[[ 0.]]]]}, CudaNdarrayConstant{[[[[ inf]]]]})]]
Here's my screen. What else do you want me to check out?
(venv_k110)keunwoo@weaver4[music-auto_tagging-keras]$ python
Python 2.7.5 (default, Nov 6 2016, 00:28:07)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import theano
thea>>> theano.__version__
'0.8.2'
>>> import keras
Using Theano backend.
keras.>>> keras.__version__
'1.1.0'
>>> keras.backend.image_dim_ordering()
'th'
(venv_k110)keunwoo@weaver4[music-auto_tagging-keras]$ git branch
* (detached from a399872)
master
(venv_k110)keunwoo@weaver4[music-auto_tagging-keras]$ python example_tagging.py
Using Theano backend.
Running main() with network: cnn and backend: theano
Predicting...
Prediction is done. It took 6 seconds.
Printing top-10 tags for each track...
data/bensound-cute.mp3
[('rock', '0.000'), ('pop', '0.000'), ('alternative', '0.000'), ('indie', '0.000'), ('electronic', '0.000')]
[('female vocalists', '0.000'), ('dance', '0.000'), ('00s', '0.000'), ('alternative rock', '0.000'), ('jazz', '0.000')]
data/bensound-actionable.mp3
[('rock', '0.000'), ('pop', '0.000'), ('alternative', '0.000'), ('indie', '0.000'), ('electronic', '0.000')]
[('female vocalists', '0.000'), ('dance', '0.000'), ('00s', '0.000'), ('alternative rock', '0.000'), ('jazz', '0.000')]
data/bensound-dubstep.mp3
[('rock', '0.000'), ('pop', '0.000'), ('alternative', '0.000'), ('indie', '0.000'), ('electronic', '0.000')]
[('female vocalists', '0.000'), ('dance', '0.000'), ('00s', '0.000'), ('alternative rock', '0.000'), ('jazz', '0.000')]
data/bensound-thejazzpiano.mp3
[('jazz', '1.000'), ('rock', '0.000'), ('pop', '0.000'), ('alternative', '0.000'), ('indie', '0.000')]
[('electronic', '0.000'), ('female vocalists', '0.000'), ('dance', '0.000'), ('00s', '0.000'), ('alternative rock', '0.000')]
Running main() with network: crnn and backend: theano
Predicting...
pPrediction is done. It took 10 seconds.
Printing top-10 tags for each track...
data/bensound-cute.mp3
[('jazz', '0.511'), ('instrumental', '0.163'), ('guitar', '0.068'), ('ambient', '0.067'), ('electronic', '0.057')]
[('experimental', '0.043'), ('blues', '0.040'), ('chillout', '0.037'), ('folk', '0.036'), ('rock', '0.030')]
data/bensound-actionable.mp3
[('jazz', '0.365'), ('instrumental', '0.215'), ('guitar', '0.092'), ('rock', '0.084'), ('blues', '0.075')]
[('experimental', '0.058'), ('Hip-Hop', '0.056'), ('folk', '0.054'), ('electronic', '0.052'), ('Progressive rock', '0.051')]
data/bensound-dubstep.mp3
[('Hip-Hop', '0.160'), ('electronic', '0.127'), ('rock', '0.115'), ('alternative', '0.061'), ('instrumental', '0.051')]
[('jazz', '0.048'), ('electronica', '0.042'), ('experimental', '0.042'), ('metal', '0.033'), ('alternative rock', '0.030')]
data/bensound-thejazzpiano.mp3
[('electronic', '0.199'), ('jazz', '0.173'), ('instrumental', '0.120'), ('ambient', '0.098'), ('chillout', '0.092')]
[('electronica', '0.060'), ('rock', '0.040'), ('chill', '0.036'), ('experimental', '0.034'), ('funk', '0.031')]
Running the same example, now with CPU instead of GPU, still crashes the script when predicting with CRNN, but provides more detailed information. Maybe this can help:
Running main() with network: crnn and backend: theano
Predicting...
Traceback (most recent call last):
File "example_tagging.py", line 91, in <module>
main(net)
File "example_tagging.py", line 74, in main
pred_tags = model.predict(melgrams)
File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1179, in predict
batch_size=batch_size, verbose=verbose)
File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 878, in _predict_loop
batch_outs = f(ins_batch)
File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 717, in __call__
return self.function(*inputs)
File "/home/francisco/anaconda2/lib/python2.7/site-packages/theano/compile/function_module.py", line 871, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/francisco/anaconda2/lib/python2.7/site-packages/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/francisco/anaconda2/lib/python2.7/site-packages/theano/compile/function_module.py", line 859, in __call__
outputs = self.fn()
ValueError: cannot reshape array of size 1440 into shape (1,1,96,1)
Apply node that caused the error: Reshape{4}(bn_0_freq_gamma, MakeVector{dtype='int64'}.0)
Toposort index: 57
Inputs types: [TensorType(float32, vector), TensorType(int64, vector)]
Inputs shapes: [(1440,), (4,)]
Inputs strides: [(4,), (8,)]
Inputs values: ['not shown', array([ 1, 1, 96, 1])]
Outputs clients: [[Elemwise{Composite{(i0 / sqrt(clip((i1 + i2), i3, i4)))}}[(0, 2)](Reshape{4}.0, TensorConstant{(1, 1, 1, ..) of 1e-05}, Reshape{4}.0, TensorConstant{(1, 1, 1, 1) of 0.0}, TensorConstant{(1, 1, 1, 1) of inf})]]
Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "example_tagging.py", line 91, in <module>
main(net)
File "example_tagging.py", line 69, in main
model = MusicTaggerCRNN(weights='msd')
File "/home/francisco/workspace/music-auto_tagging-keras/music_tagger_crnn.py", line 90, in MusicTaggerCRNN
x = BatchNormalization(axis=freq_axis, name='bn_0_freq')(x)
File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 514, in __call__
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 572, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 149, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/layers/normalization.py", line 138, in call
epsilon=self.epsilon)
File "/home/francisco/anaconda2/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 412, in normalize_batch_in_training
broadcast_gamma = T.reshape(gamma, target_shape)
Are you using anaconda or virtualenv? If so, can you please specify every command you do since you create the conda/virtualenv environment? I've even tried different Python versions and still no luck... Also, it's better if you test it using the CPU so we don't have any cuda or CuDNN differences...
Confirmed again. With virtualenv on linux.
virtualenv venvtemp
source venvtemp/bin/activate
pip install pip --upgrade
pip install keras==1.1.0 h5py theano==0.8.2 librosa
and
git checkout a399872
did the job.
ok, finally found out the issue. when I did checkout it did not replace the CRNN model, since I slightly changed that file. I guess the problem with the most recent version of CRNN is that you're doing batch normalization along the frequency axis instead of doing it along the time axis (which is what is being done in a399872). However, you want to normalize in frequency axis instead of time, right? If so, the real solution would be to normalize along frequency axis BUT using mode=1 instead of mode=0, since you want to do it on a song by song basis. So my question now is, what version of the software was used to train the weights for CRNN? My guess is that no version was ever doing axis=freq and mode=1 so the weights are from a model that was wrongly implemented...
Okay, so trying with my example should work, right? Have you tried?
When I was doing BN along time axis, I did it by mistake. Although it turns out that it doesn't really matter much. BN with mode=1 had some keras-related bug back then, so I had to use mode=0 as far as I recall.
So, with respect to what I intended (which is freq-axis BN), a399872 is wrong. But it works well.
Hi, Yes it works, in the sense that it does not crash. But, in that same sense, axis=freq and mode=1 also works (at least during prediction). However, since I want to use the pre-trained model, I should use axis=time and mode=0 since I guess that was how the code was at the time you trained the model. Is that correct?
I have use Theano (0.8.2)&Keras (1.1.0) and got the same error on example_tagging , what should i do to make it work?
I have use Theano (0.8.2)&Keras (1.1.0) and got the same error on example_tagging , what should i do to make it work?
You should do:
Hi,
Do you know if the pre-trained CRNN model works for any combination of library versions? I've tried tensorflow, but I have seen another issue here saying that it is impossible to load tensorflow weights. I'm using Keras 1.2.0 and Theano 0.9.0 and I'm getting the following error output during model.predict():
ValueError: GpuReshape: trying to reshape an array of total size 1440 into an array of total size 96. Apply node that caused the error: GpuReshape{4}(bn_0_freq_running_mean, TensorConstant{[ 1 1 96 1]}) Toposort index: 39 Inputs types: [GpuArrayType<None>(float32, (False,)), TensorType(int64, vector)] Inputs shapes: [(1440,), (4,)] Inputs strides: [(4,), (8,)] Inputs values: ['not shown', array([ 1, 1, 96, 1])] Outputs clients: [[GpuElemwise{sub,no_inplace}(GpuIncSubtensor{Set;::, ::, int64:int64:, int64:int64:}.0, GpuReshape{4}.0)]]