@michaelauli @jgehring
the following is my loss regularization code,I directly added the penal item to crit.output,is it right for model regularization?
net:forward(sample.input)
crit:forward(net.output, sample.target)
local A
for _,b in ipairs(_G.model.selfattentivesoftmax) do
A=b.output
end
local B=A:clone()
for i=1,B:size(1) do
A=B[i]:clone()
local AAT=torch.mm(A,A:t())
local I=torch.eye(A:size(1))
local P=torch.norm( AAT - I, 2 )
local penal=PP
penal = penal/A:size(2)
crit.output=crit.output+_G.model.selfattentivelamdapenal
end
crit:backward(net.output, sample.target)
net:backward(sample.input, crit.gradInput)
in the document,
-- Loss:
f = f + opt.coefL1 norm(parameters,1)
f = f + opt.coefL2 norm(parameters,2)^2/2
but my regularization is not L1,L2 regularization.in my regularization code above the A is one network layer output ,is not the whole parameters,so what should i do to write right regularization code
@michaelauli @jgehring the following is my loss regularization code,I directly added the penal item to crit.output,is it right for model regularization? net:forward(sample.input) crit:forward(net.output, sample.target) local A for _,b in ipairs(_G.model.selfattentivesoftmax) do A=b.output end local B=A:clone() for i=1,B:size(1) do A=B[i]:clone() local AAT=torch.mm(A,A:t()) local I=torch.eye(A:size(1)) local P=torch.norm( AAT - I, 2 ) local penal=PP penal = penal/A:size(2) crit.output=crit.output+_G.model.selfattentivelamdapenal end crit:backward(net.output, sample.target) net:backward(sample.input, crit.gradInput)
in the document, -- Loss: f = f + opt.coefL1 norm(parameters,1) f = f + opt.coefL2 norm(parameters,2)^2/2
but my regularization is not L1,L2 regularization.in my regularization code above the A is one network layer output ,is not the whole parameters,so what should i do to write right regularization code