Repository code is correct but I have a lot of doubts while understanding the code. May be my doubts are very silly as I recently started learning deep models.
Following are my doubts:
When we implement single layer LSTM model, How many LSTM units we use?( I think it is one)
What is the meaning of the following line?
cmd:option('-seq_length',50,'number of timesteps to unroll for')
Basically I didn't get why I need "-seq_length"?
Why we need to clone the network various times ?
-- make a bunch of clones after flattening, as that reallocates memory
clones = {}
for name,proto in pairs(protos) do
print('cloning ' .. name)
clones[name] = model_utils.clone_many_times(proto, opt.seq_length, not proto.parameters)
end
Do cloned networks share the parameters ?
Why we need MemoryFile() while cloning the network ? As in the following link Cloning Network , there is a simple cloning of network without using MemoryFile() ? Is there any advantage of using MemoryFile() here ?
After multiple cloning of network, How do we distribute the data among various clones ? I am not able to understand the following code.
for t=1,opt.seq_length do
clones.rnn[t]:training() -- make sure we are in correct mode (this is cheap, sets flag)
**local lst = clones.rnn[t]:forward{x[t], unpack(rnn_state[t-1])}**
rnn_state[t] = {}
for i=1,#init_state do table.insert(rnn_state[t], lst[i]) end -- extract the state, without output
predictions[t] = lst[#lst] -- last element is the prediction
loss = loss + clones.criterion[t]:forward(predictions[t], y[t])
end
I searched on internet about this a lot but not found any good resource to understand. I know it is not an issue but still posting here. Please help me to understand it.
Thanks in advance
unified seq_length makes it convenient to forward a batch
The proto.rnn here is just a feed forward network for one time step, we need to manually copy it for 'seq_length' times of steps to make it work recursively.
Yes, if not, there will not be any recursive structure in the model.
Repository code is correct but I have a lot of doubts while understanding the code. May be my doubts are very silly as I recently started learning deep models. Following are my doubts:
When we implement single layer LSTM model, How many LSTM units we use?( I think it is one)
What is the meaning of the following line?
cmd:option('-seq_length',50,'number of timesteps to unroll for')
Basically I didn't get why I need "-seq_length"?Why we need to clone the network various times ?
Do cloned networks share the parameters ?
Why we need MemoryFile() while cloning the network ? As in the following link Cloning Network , there is a simple cloning of network without using MemoryFile() ? Is there any advantage of using MemoryFile() here ?
After multiple cloning of network, How do we distribute the data among various clones ? I am not able to understand the following code.
I searched on internet about this a lot but not found any good resource to understand. I know it is not an issue but still posting here. Please help me to understand it. Thanks in advance