mini-batch in forward computing?

I'm focusing on Tree-LSTM. My question is:

can torch implement of treeLSTM do mini-batching during forward pass?
And if not, has anyone see any mini-batch implementation of treeLSTM in Theano/tensorflow/pytorch/mxnet/dynet/tf-fold... or any other framework, or even C++?
for the framework like torch that is "dynamic", will mini-batch speedup the computation?

I'm new to torch, but from the code in sentiment/TreeLSTMSentiment.lua

      local loss = 0
      for j = 1, batch_size do
        local idx = indices[i + j - 1]
        local sent = dataset.sents[idx]
        local tree = dataset.trees[idx]

        local inputs = self.emb:forward(sent)
        local _, tree_loss = self.treelstm:forward(tree, inputs)
        loss = loss + tree_loss
        local input_grad = self.treelstm:backward(tree, inputs, {zeros, zeros})
        self.emb:backward(sent, input_grad)

I think the forward and backward pass are computed one sample a time, and accumulate the gradient in a mini-batch before updating. So it is not a mini-batch forward computing, but a mini-batch update.

@kaishengtai Would you help?

stanfordnlp / treelstm

mini-batch in forward computing? #15