Open aakash-saboo opened 5 years ago
As such, yes, the code is easily usable for class indices targets and NLL loss.
In fact, I believe that you can implement your model in the same way we did with the regression case. First, use linear (without activation function) output layers, like LinearRegressor
. Second, the criterion
of AlraoModel
should be set to MultiLabelSoftMarginLoss
and task
to 'regression'
. Third, you have to write your own final loss, based on L2LossAdditional
in custom_layers.py
. This loss should aggregate the outputs of all last layers with respect to their weights and return the 'weighted mean' loss over them. It should take as input: the tensor of the outputs of all last layers, the log-probas associated to each last layer, and the targets. It should return the aggregated log-probas for each class you consider.
This may not be clear, but I will try to provide an example with user-defined losses as soon as I can. Meanwhile, you can take a look at the 'regression' case and the related functions (all is in main_reg.py
, custom_layers.py
and switch.py
).
Hi Pierre! Thanks for such a quick response. Although I did figure out the regression way but didnt consider changing the final loss as you mentioned. Instead what I did is the following:
I commented the condition on self.task =='classification' in the function "Supdate" of switch.py because the problem is of multi-label classification. I did not want to calculate accuracy.
I commented the "x = F.log_softmax(x, dim=1)" in "class LinearClassifier(nn.Module):" because I dont need log of softmax probabilities.
I changed the criterion like you said above.
Changed the VGG implementation so that it can use images other than 32x32
did some modification in train() function also so that anything related to accuracy is commented.
I DID NOT do any modification to final loss as you said I SHOULD do.
I checked it and the value of the final loss was of the same order as without ALRAO. So I thought everything is working fine. I will do what you said above and will report back soon. Meanwhile I would request you to check whether the approach I took will give incorrect results or not . I couldnt spend much time on debugging. Ill appreciate it a lot
Thanks
Thank you for your interest.
I assume that you are using MultiLabelSoftMarginLoss.
In principle, your method is valid. In fact, the MultiLabelSoftMarginLoss returns a negative log-likelihood scaled by a factor 1/nClasses, which should work (and seems to work, according to your experiments).
However, the scaling 1/nClasses is likely to distort the behaviour of the switch part: the averaging over the last layers may be pulled a little too much towards a uniform averaging. Still, the effect should be very small.
If you encounter a problem linked to it, I suggest you to remove the scaling 1/nClasses in the loss taken in argument of AlraoModel
(not in the global loss defined in your main file). Another way to proceed consists in modifying the parameter of the switch theta
, which could be very tricky.
I hope these informations would be helpful and I am very interested by your feedback.
I changed the loss to MultLabelSoftMargin classifier. My input is a regular 4 dimension image array and targets are one hot encoded vectors(with vector length = n_classes). I am performing multilabel classification and when dug into the code, i found comments specifying "log probabilities" only as output of linear layer(which is mainly used for NLLL loss).
Can you specify is the code is only valid for class indices targets and NLLL loss?