Closed whalefa1I closed 2 years ago
I will take a check, thanks for the reminding. Just currently occupied with something else. For this project, I'm trying partialFC
on Glint360K
recently, and it took me a long training time...
我看他们训练clip多模态预训练模型的时候有的人用过 Gradient Accumulation,你看看有没有能帮到你的
partialFC
的实现结果更新到了 Training on large datasets with a lot of identities #90.在做了在做了,已经把 webface4m 和 webface12m 的 r100 模型转化出来了,跑跑验证就开始写损失函数了。
AdaFaceLoss
更新了:
r00 webface4m
和 r100 webface12m
AdaFaceLoss
只是跑了几个 batch 确认训练中 loss 会收敛,还没有完整训练adaface和magface都有个反映人脸图片质量的方式,做个norm啥的就行,你有注意过吗
对的,Converted MagFace / AdaFace r50 / r100 model and face quality testing #57 这里就是用 norm 值作为人脸质量值在 cfp_fp / agedb_30 上的测试
你这手也太快了,有模型吗,我自己训练的感觉没啥效果啊,为啥模糊的比好的分还高
哦刷新看到模型了,我试试
人脸质量测试感觉不如 magface,还得看看论文人家是怎么用的。Readme 里 EffV2S,MagFace
这个是自己训练的,质量测试效果看起来还可以
另外还有一个使用 MagFace 结果再做人脸质量训练的 QMagFace: Simple and Accurate Quality-Aware Face Recognition
B.1. Correlation between Norm and BRISQUE during Training
感觉 adaface 的 norm 值不能用来判断人脸质量safe_norms = safe_norms.clone().detach()
感觉应该将整个 margin 计算的过程放到 tf.stop_gradient
中,这样也符合论文中描述 Gradient doesn't flow to ∥zi∥
adaface_ir101_webface*m_rgb.h5
,使用 RGB 输入的,修正了验证数据集上的准确度1、torch模型怎么移植成tf的格式嘞,需要换框架复现代码重新训练,还是把权重拿出来就行了 2、有没有什么方法可以固定两个框架的随机初始化数值,判断复现结果是否一致
keras_cv_attention_models
。loss=nan
,正在用 sgd / sgdw 重新跑。你 是 我的神
倒是也不必
我把keras_cv_attention_models里面的download_and_load和test_images放到项目里,然后把adaface里面的net和head也放进去,下的ckpt就是“adaface_ir101_webface4m.ckpt”,但是我在convert的时候报错了
====================
stack1_block1_shortcut_conv
Traceback (most recent call last):
File "/data/xixi/project/Github/Keras_insightface/torch_model_conversion.py", line 21, in <module>
download_and_load.keras_reload_from_torch_model(
File "/data/xixi/project/Github/Keras_insightface/download_and_load.py", line 311, in keras_reload_from_torch_model
keras_reload_stacked_state_dict(keras_model, stacked_state_dict, aligned_names, additional_transfer, save_name=save_name)
File "/data/xixi/project/Github/Keras_insightface/download_and_load.py", line 166, in keras_reload_stacked_state_dict
torch_weight[0] = np.transpose(torch_weight[0], (2, 3, 1, 0))
File "<__array_function__ internals>", line 180, in transpose
File "/home/nlp/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 660, in transpose
return _wrapfunc(a, 'transpose', axes)
File "/home/nlp/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
return bound(*args, **kwds)
ValueError: axes don't match array
具体的在'stack1_block1_shortcut_conv'在这层里面维度不是普通卷积,没法进行transpose
还需要改啥吗
用的代码是:
mm = models.buildin_models('r100', output_layer='E', activation="PReLU", bn_momentum=0.9, bn_epsilon=1e-5, use_bias=True, scale=False, use_max_pool=True)
tail_align_dict = {"shortcut_conv": -4, "shortcut_bn": -5}
full_name_align_dict = {"E_batchnorm": 3, "E_dense": 4, "pre_embedding": 5}
# [25088, 512] -> CHW + out [512, 7, 7, 512] -> HWC + out [7, 7, 512, 512] -> [25088, 512]
additional_transfer={
"E_dense": lambda ww: [ww[0].reshape(512, 7, 7, 512).transpose([1, 2, 0, 3]).reshape([-1, 512]), ww[1]],
"pre_embedding": lambda ww: [np.zeros(512), *ww],
}
download_and_load.keras_reload_from_torch_model(
'adaface_ir101_webface4m.ckpt',
keras_model=mm,
tail_align_dict=tail_align_dict,
full_name_align_dict=full_name_align_dict,
additional_transfer=additional_transfer,
input_shape=(112, 112),
do_convert=True,
save_name="adaface_ir101_webface4m.h5",
)
你的 Keras_insightface/backbones/resnet.py
没有更新吧,指定 use_max_pool=True
时 stack_1_block_1
没有 shortcut_conv
,就是 adaface 用的 resnet 结构,更新一下你的 Keras_insightface/backbones/resnet.py
。
y_true = tf.one_hot(tf.random.uniform([32], 1, 10, dtype='int32'), 10)
y_pred = tf.random.uniform([32, 10])
y_pred_norm = tf.concat([y_pred, tf.norm(y_pred, axis=-1, keepdims=True)], axis=-1)
import losses
aa = losses.AdaFaceLoss()
print(aa(y_true, y_pred_norm))
import torch
import head
from torch.nn import CrossEntropyLoss
bb = head.AdaFace(embedding_size=10, classnum=32)
cc = bb(torch.from_numpy(y_pred_norm[:, :-1].numpy()), torch.from_numpy(y_pred_norm[:, -1:].numpy()), torch.from_numpy(np.argmax(y_true, axis=-1)))
cross_entropy_loss = CrossEntropyLoss()
loss = cross_entropy_loss(cc, torch.from_numpy(np.argmax(y_true, axis=-1)))
print(loss)
是不是因为源码head刚进来做了个随机初始化的全连接,算出margin loss的cosine。keras的loss进来之前你已经在上面做过norm了,所以不具备数值意义上的可比性。如果观察backprop的复现效果的话,一般是大概差不多,符合论文说明,收敛就行了,还是要像个办法严格控制数值呢
啊,你说这个,这个对比测试需要改一下代码:
head.py
使用输入的 embbedings
直接作为 cosine 值
65 def forward(self, embbedings, norms, label):
66
67 # kernel_norm = l2_norm(self.kernel,axis=0)
68 # cosine = torch.mm(embbedings,kernel_norm)
69 # cosine = cosine.clamp(-1+self.eps, 1-self.eps) # for stability
70 cosine = embbedings
losses.py
中 AdaFaceLoss
408 行去掉注释 return arcface_logits
,直接返回 arcface_logits
408 return arcface_logits
409 # return tf.keras.losses.categorical_crossentropy(y_true, arcface_logits, from_logits=self.from_logits, label_smoothing=self.label_smoothing)
测试
y_true = tf.one_hot(tf.random.uniform([32], 1, 10, dtype='int32'), 10)
y_pred = tf.random.uniform([32, 10])
y_pred_norm = tf.concat([y_pred, tf.norm(y_pred, axis=-1, keepdims=True)], axis=-1)
import losses
aa = losses.AdaFaceLoss()
aa(y_true, y_pred_norm)
sys.path.append('../AdaFace-master/')
import torch
import head
bb = head.AdaFace(t_alpha=0.01)
cc = bb(torch.from_numpy(y_pred_norm[:, :-1].numpy()), torch.from_numpy(y_pred_norm[:, -1:].numpy()), torch.from_numpy(np.argmax(y_true, axis=-1)))
print(f"{aa(y_true, y_pred_norm).numpy() = }, {cc.mean() = }")
# aa(y_true, y_pred_norm).numpy() = 30.912012, cc.mean() = tensor(30.9092)
去掉 scale 放大的 64 倍的话,两个值基本相同
print(f"{aa(y_true, y_pred_norm).numpy() / 64 = }, {cc.mean() / 64 = }")
# aa(y_true, y_pred_norm).numpy() / 64 = 0.4830001890659332, cc.mean() / 64 = tensor(0.4830)
你上面的模型转化成功了吗?
成功惹!应该就是shortcut的原因
- AdaFace head.py#L72 的实现
safe_norms = safe_norms.clone().detach()
感觉应该将整个 margin 计算的过程放到tf.stop_gradient
中,这样也符合论文中描述Gradient doesn't flow to ∥zi∥
norm_mean = tf.stop_gradient(tf.math.reduce_mean(feature_norm))
samples = tf.cast(tf.maximum(1, feature_norm.shape[0] - 1), feature_norm.dtype)
norm_std = tf.stop_gradient(tf.sqrt(tf.math.reduce_sum((feature_norm - norm_mean) ** 2) / samples)) # Torch std
self.batch_mean.assign(self.mean_std_alpha * norm_mean + (1.0 - self.mean_std_alpha) * self.batch_mean)
self.batch_std.assign(self.mean_std_alpha * norm_std + (1.0 - self.mean_std_alpha) * self.batch_std)
具体有啥地方需要改吗,感觉没啥差别哎。是两个框架stop gradient的逻辑不一样吗
更新了,因为训练还没有跑完,之前这部分没有更新
按这样的话,等价的pytorch是不是
with torch.no_grad():
mean = safe_norms.mean().detach()
std = safe_norms.std().detach()
self.batch_mean = mean * self.t_alpha + (1 - self.t_alpha) * self.batch_mean
self.batch_std = std * self.t_alpha + (1 - self.t_alpha) * self.batch_std
margin_scaler = (safe_norms - self.batch_mean) / (self.batch_std+self.eps) # 66% between -1, 1
margin_scaler = margin_scaler * self.h # 68% between -0.333 ,0.333 when h:0.333
margin_scaler = torch.clip(margin_scaler, -1, 1)
还是说torch放外面就可以
我对 pytorch 没那么熟悉,根据一些文章来看,比如 Difference between detach().clone() and clone().detach(),我认为 safe_norms = safe_norms.clone().detach()
与将 safe_norm
相关的所有计算放到 torch.no_grad
里面应该是等效的,使用 clone().detach()
这种方式可能是更确保截断了梯度。
这个解释的更好点 Detach, no_grad and requires_grad
感觉是的,应该就是双保险的意思,或者最多是torch.no_grad
做了内存优化,快一些。
目前的结果看起来还不错,r50 + SGD + AdaFace 53 epochs:
import losses, train, models
import tensorflow_addons as tfa
keras.mixed_precision.set_global_policy("mixed_float16")
data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]
basic_model = models.buildin_models('r50', dropout=0.4, emb_shape=512, output_layer='E', bn_momentum=0.9, bn_epsilon=1e-5, scale=True, use_bias=False, activation='prelu', use_max_pool=True)
basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False)
tt = train.Train(data_path, eval_paths=eval_paths,
save_path='TT_r50_max_pool_E_prelu_dr04_lr_01_l2_5e4_adaface_emb512_sgd_m09_bs512_ms1m_64_only_margin_SG_scale_true_bias_false_random_100.h5',
basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-6, lr_warmup_steps=3,
batch_size=512, random_status=100, eval_freq=4000, output_weight_decay=1)
# optimizer = tfa.optimizers.AdamW(learning_rate=1e-2, weight_decay=5e-4, exclude_from_weight_decay=["/gamma", "/beta"])
# optimizer = tfa.optimizers.SGDW(learning_rate=1e-2, weight_decay=5e-6, momentum=0.9, exclude_from_weight_decay=["/gamma", "/beta"])
optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
sch = [
{"loss": losses.AdaFaceLoss(scale=64), "epoch": 53, "optimizer": optimizer},
]
tt.train(sch, 0)
![]() |
1e-06 | 1e-05 | 0.0001 | 0.001 | 0.01 | 0.1 | AUC | |
---|---|---|---|---|---|---|---|---|
r50 IJBB | 0.393379 | 0.91334 | 0.955501 | 0.970204 | 0.978773 | 0.986465 | 0.993366 | |
r50 IJBC | 0.888633 | 0.952702 | 0.969269 | 0.979496 | 0.985734 | 0.991052 | 0.995485 |
PyTorch 训练 26 epochs 的结果: | Arch | Dataset | Method | IJBB TAR@FAR=0.01% | IJBC TAR@FAR=0.01% |
---|---|---|---|---|---|
R50 | WebFace4M | AdaFace | 95.44 | 96.98 | |
R50 | MS1MV2 | AdaFace | 94.82 | 96.27 |
大概看了下,感觉速度不快的样子,单张图片 get_scaled_quality
调用 100 次前向,get_gradients
再调用反向,感觉不太好跟目前的实现集成
Adaface + r100 的训练结果这几天应该会上传,53 Epochs 的结果是 IJBB 0.961636,IJBC 0.972849,相对于PyTorch的26 Epochs IJBB 95.84, IJBC 97.09
恭喜啊!!!tql!!!那我只用ghostnet就够了!注意身体嗷
r100 的训练结果上传了,可以作为训练 ghostnet 的参考
FYI https://paperswithcode.com/paper/adaface-quality-adaptive-margin-for-face