xyutao / fscil

Official repository for Few-Shot Class-Incremental Learning (FSCIL)
221 stars 36 forks source link

About loss Function #17

Open lry124 opened 3 years ago

lry124 commented 3 years ago

Hello, I am repetition the code with the method in the paper ( using cifar100 dataset and quick base net), however I am a little confused now:

  1. What is the centroid vector in the NG network you chose? Is the output of maxsoft layer? If I choose the maxsoft as my cemtroid vector (100 dims), the diagonal matrix A's value is very small like 0.000000673. So, A-1 (inverted A) calculated is very large, and the questions happens.

  2. What is the approximate value of AL loss? Why is the value of my AL_loss are very very large?

  3. About MML loss, ''Given the new class training set D(t) and NG G(t), for a training sample (x; y) 2 D(t), we extract its feature vector f = f(x; θ(t)) and feed f to the NG. We hope f matches the node vj whose label cj = y, and d(f; mj)  d(f; mi); i 6= j, so that x is more probable to be correctly classified.'' f is a node belong to new classes, How can I find a node whose label y equal to f's label? Because f is the new classes' data.

xyutao commented 3 years ago

Hi @lry124, don't quite understand your question:

What is the maxsoft layer? In the paper, we say we treat CNN as a feature extractor and a classification layer, and the neural gas is trained with the extracted feature vectors. For ResNet18-thumbnail version for CIFAR100, it should be 64 dim; while for CUB200, it should be 512 dim. I don't know why you have a 100-dim centroid vector.

For MML loss, when learning new classes, we insert k (e.g. k=1) new NG nodes for each new class, as mentioned in section 3.3.

lry124 commented 3 years ago

Thank you very much for your reply. In the paper, part of it says :The classification head with the parameter set φ produces the output vector followed by a softmax function to predict the probability p over all classes. CIFAR100 has 100 classes totally, so I think the vectors dim is 100.

xyutao commented 3 years ago

The operations (NG, AL, MML) are not applied to the output vector, but to the feature vector before the classification head.