Balanced Meta-Softmax mixed with CosFace

abdikaiym01 commented 2 years ago

Hello @leondgarse, I've implemeted Balanced Meta-Softmax (https://github.com/jiawei-ren/BalancedMetaSoftmax) with CosFace loss function, but I have problem with the сonvergence. Test metric fall down like AgeDB. Could you implement this on your great framework?

leondgarse commented 2 years ago

There are something I've thought about:

The class label for training is actually not same with the folder name, so if you count the sample_per_class from the raw folder, would be a problem. May try like:

import data
import pandas as pd

dataset_path = '/datasets/faces_casia_112x112_folders'
image_names, image_classes, embeddings, classes, dest_pickle = data.pre_process_folder(dataset_path)

# Count samples by class labels
aa = pd.value_counts(image_classes).sort_index().values
# Save to npy
np.save('faces_casia_sample_per_class.npy', aa)

Then loading back in the loss function:

sample_per_class = np.load('faces_casia_sample_per_class.npy')
sample_per_class = np.log(sample_per_class)
sample_per_class  = tf.convert_to_tensor(sample_per_class .astype('float32'))
...

# Apply other steps as `balanced_softmax_loss`
logits += sample_per_class
...

But I don't think this raw strategy will be good. The original logits for CosFace or ArcFace is in value range [-1, 1], but for even the small dataset like CASIA, the log value of sample_per_class is in value range [0.6931472, 6.6871085], and most of them in range [2.7, 3.7], which is being too large...

aa = np.load('faces_casia_sample_per_class.npy')
aa = np.log(aa)
pd.value_counts(aa).head(20)
# 2.708050    556
# 2.772589    514
# 2.833213    495
# 2.890372    444
# 2.944439    402
# 3.044522    352
# 2.995732    350
# 2.639057    315
# 3.091042    300
# 3.135494    290
# 3.178054    258
# 3.258097    251
# 3.218876    247
# 3.295837    236
# 3.401197    202
# 3.367296    198
# 3.332205    191
# 3.433987    164
# 2.564949    160
# 3.465736    157

abdikaiym01 commented 2 years ago

Ok, I have some troubles, with the сonvergence. What would you suggest to make it work well since the some datasets are very long-tailed?

leondgarse commented 2 years ago

I don't have many experience in handling those long-tailed classification either. Speaking this Meta-Softmax, I think the original should work better with softmax loss. But with CosFace loss, I think it may be better using * instead of + combing logits with sample_per_class:

  class CosFaceLoss(ArcfaceLossSimple):
      def __init__(self, margin=0.35, scale=64.0, from_logits=True, label_smoothing=0, sample_per_class=None, **kwargs):
          super(CosFaceLoss, self).__init__(margin, scale, from_logits, label_smoothing, **kwargs)
          self.sample_per_class = sample_per_class

      def call(self, y_true, norm_logits):
          if self.batch_labels_back_up is not None:
              self.batch_labels_back_up.assign(tf.argmax(y_true, axis=-1))
          pick_cond = tf.cast(y_true, dtype=tf.bool)
          logits = tf.where(pick_cond, norm_logits - self.margin, norm_logits)
          if self.sample_per_class is not None:
              logits *= self.sample_per_class  # * or +
          logits *= self.scale
          return tf.keras.losses.categorical_crossentropy(y_true, logits, from_logits=self.from_logits, label_smoothing=self.label_smoothing)

Anyway, I didn't take any test on this...

leondgarse / Keras_insightface

Balanced Meta-Softmax mixed with CosFace #96