Closed ParthaEth closed 7 years ago
Hello,
First of all : LOL, I'd never have thought it possible to get such spaghetti monstrosity to run.
def selectIndices(x, indices):
out = x[:, :, 0:1]
for i in range(len(indices)-1):
out = K.concatenate([out, x[:, :, indices[i + 1]:indices[i + 1] + 1]], axis=-1)
return out
This code is typically quadratic with respect to the length of indices. (But it probably won't explain a 10x time difference).
You probably should search for a way to unit test some bite-able chunks.
You probably should factorize it to make the graph more apparent and not intertwined with layer construction. It will drop the bug probablility. And you will be able to unit test it with simpler graphs (like with two edges and one node). Then generalize it to a bigger graph and bug shouldn't appear.
@unrealwill Thanks for the general tips. I understand what you mean basically divide and conquer. But the thing is: Without completely constructing the graph how will any one guess the expected speed, but I guess I will give it a try. The network will not make sense and will not be accurate but I can see when the slow down happens.
I also hate the lambda layers. especially because of the way I pick the desired index. it should have been as simple as out = tensor[:, :, idxs]
but for some strange reason this doesn't work. - Then again these layers are not the bottleneck. I know this because I just replaced them with dense layers so they match the sizes and the network was still slow.
To add some more information - the GPU usage stays low during the run. i.e. although Volatile GPU-Util spikes frequently most of the time it stays below 10% (inspected through nvidia-smi).
Finally I can print out the model as .png and it looks right! So if that is the case then I do not see a reason to believe there is a bug on my side, if that is the case then why is it so slow. This is where I am stuck. :) The model image if any one is interested is attached here.
Have you tried the correction I hinted?
(Not even tried to run, but this is not quadratic)
def selectIndices(x, indices):
l = [ x[:, :, 0:1] ]
for i in range(len(indices)-1):
l.append(x[:, :, indices[i + 1]:indices[i + 1] + 1] )
out = K.concatenate( l , axis=-1)
return out
@unrealwill I have replaced the Lambda layers with TimeDistributed(Denase)
layers to check if the Lambds are slowing things down. But that is not the case.
I'm not sure your replacement by TimeDistributed(Dense) helps (It's not a cheap operation).
Also in your code :
both_hands ... mode='**sum**', **concat_axis**=-1
and the next line. Are confusing.
RNN do not exploit parallel computing well as they are fundamentally sequential. Increasing your batch_size may help to obtain a higher GPU utilization.
What exactly changes between your 25M parameters and your 38M parameters?
@unrealwill the whole architecture. :) the 38M model is just 3LSTM layers stacked on top of each other. I also figured that my TensorFlow version was 0.10. Can that be a reason? I can not upgrade this because I do not have sudo permission and the installed cuda version is 7.5. I tried to build tensorflow but did not mange to get it done completely.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Hi all,
I have the following model. It is the implementation of the human model of this paper
This model has about 25M parameters but is around 10 times slower with a model that has 38M parmas. This is where I am confused. Any idea how might I debug this? or why this is legit?
`def getSRNN(batch_input_shape, W_regularizer_val, stateful, output_data_shape, spine_idx, l_arm_idx, r_arm_idx, l_leg_idx, r_leg_idx):