gradOuput is the derivative of the loss w/r to the gram matrix, in other words, dE/dGij or dE/dG. That's computed in StyleLoss:updateGradInput and named dG.
gradInput is the derivative of the loss w/r to the feature map, in other words, dE/dFij or dE/dF. That's what the function needs to compute.
G = F * Ft. That's the Gram matrix computed in Gram:updateOutput.
The function computes dE/dF like so
dE/dF = dE/dG F + (dE/dG)t F
Why bother computing dE/dG when you can compute dE/dF directly from G, G of target, and F?
We know that
The function computes dE/dF like so dE/dF = dE/dG F + (dE/dG)t F
Why bother computing dE/dG when you can compute dE/dF directly from G, G of target, and F?