vitalwarley / research

3 stars 0 forks source link

Reproduzir "A Multi-Task Comparator Framework for Kinship Verification" #49

Closed vitalwarley closed 11 months ago

vitalwarley commented 11 months ago

41

https://arxiv.org/pdf/2006.01615

vitalwarley commented 11 months ago
You are about to run mtcf as a batch (9 trials)
  batch-size: [200, 400, 600]
  device: '0'
  end-lr: 0.0005
  l2-factor: 0.0002
  loss-log-step: 100
  num-epoch: [4, 8, 12]
  output-dir: exp
  root-dir: rfiw2021/Track1
  start-lr: 0.001
  train-dataset-path: rfiw2021/Track1/sample0/train_sort.txt
  val-dataset-path: rfiw2021/Track1/sample0/val_choose.txt
  weights: weights/ms1mv3_arcface_r100_fp16.pth
Continue? (Y/n)

Despachado na RIG2. Logo mais trago detalhes sobre como fiz.

vitalwarley commented 11 months ago

Os experimentos não convergiram, provavelmente por causa da perda que eu estava usando (CrossEntropyLoss). Ajustei o código para usar a BCEWithLogits, que inclusive é a correta pelo artigo.

vitalwarley commented 11 months ago

Mais épocas são necessárias

image

You are about to stage trials for mtcf as a batch (9 trials)
  batch-size: [200, 400, 600]
  device: '0'
  end-lr: 0.0005
  l2-factor: 0.0002
  loss-log-step: 100
  num-epoch: [16, 20, 24]
  output-dir: exp
  root-dir: rfiw2021/Track1
  start-lr: 0.001
  train-dataset-path: rfiw2021/Track1/sample0/train_sort.txt
  val-dataset-path: rfiw2021/Track1/sample0/val_choose.txt
  weights: weights/ms1mv3_arcface_r100_fp16.pth
vitalwarley commented 11 months ago

Acima epoch_acc é provinda do conjunto de treino, enquanto epoch_auc é provinda do conjunto de validação para escolha do limiar de z (probabilidade de ser kin ou non-kin).

vitalwarley commented 11 months ago

image

De fato, mais épocas ajudaram. Um batch maior também, dentro da mesma época.

vitalwarley commented 11 months ago
You are about to stage trials for mtcf as a batch (9 trials)
  batch-size: [1024, 2048, 3072]
  device: '0'
  end-lr: 0.0005
  l2-factor: 0.0002
  loss-log-step: 100
  num-epoch: [50, 100, 150]
  output-dir: exp
  root-dir: rfiw2021/Track1
  start-lr: 0.001
  train-dataset-path: rfiw2021/Track1/sample0/train_sort.txt
  val-dataset-path: rfiw2021/Track1/sample0/val_choose.txt
  weights: weights/ms1mv3_arcface_r100_fp16.pth
Continue? (Y/n)

Acho que será o suficiente para resgatar o melhor modelo e avaliar nos demais conjuntos.

vitalwarley commented 11 months ago

Ainda há alguns experimentos em curso, como esse abaixo

image

Todavia dificilmente vão passar de >0.7. Dado que os autores não reportam AUC, mas apenas acurácia de no test set, não temos como saber se a reprodução foi satisfatória em termos de resultados. Por outro lado, em termos de arquitetura, hiperparâmetros, eu acredito que não foi satisfatória. Listo os motivos abaixo.

Várias possibilidades poderiam ser implementadas nessa estratégia, mas não sinto que vale a pena o esforço por agora, dado a diferença entre para o AUC nela frente ao SOTA2021.

vitalwarley commented 11 months ago

Resultados do melhor modelo até agora (há um experimento em curso ainda):

Os autores conseguirem 0.736 de acurácia. Logo, não consegui reproduzir os resultados.

Log validation ``` ➜ ours git:(main) ✗ guild run mtcf:val root-dir=rfiw2021/Track1 dataset-path=rfiw2021/Track1/sample0/val.txt output-dir=exp weights=exp/best.pth operation:mtcf=`guild select -Fo mtcf -Sc --max 'epoch_auc'` Refreshing flags... WARNING: cannot import flags from train_fc.py: ModuleNotFoundError: No module named 'dataset' (run with guild --debug for details) WARNING: cannot import flags from train_kv.py: ModuleNotFoundError: No module named 'dataset' (run with guild --debug for details) You are about to run mtcf:val batch-size: 1024 dataset-path: rfiw2021/Track1/sample0/val.txt device: '0' operation:mtcf: aa81ce57e78444e7880b5fa48cbeaa91 output-dir: exp root-dir: rfiw2021/Track1 weights: exp/best.pth Continue? (Y/n) Resolving file:weights Resolving file:../rfiw2021/ Resolving file:models/insightface Resolving operation:mtcf Using run aa81ce57e78444e7880b5fa48cbeaa91 for operation:mtcf 2023-12-23 16:05:20.804376: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2023-12-23 16:05:20.806769: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 2023-12-23 16:05:20.847495: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-12-23 16:05:20.847521: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-12-23 16:05:20.848638: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-12-23 16:05:20.855000: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 2023-12-23 16:05:20.855205: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-12-23 16:05:21.648149: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Namespace(root_dir='rfiw2021/Track1', dataset_path='rfiw2021/Track1/sample0/val.txt', weights='exp/best.pth', output_dir=PosixPath('exp'), batch_size=1024, device='0', func=) Current CUDA Device = 0 Device Name = NVIDIA GeForce RTX 3090 Loaded 129032 samples from rfiw2021/Track1/sample0/val.txt (with duplicated samples for same generation bb, ss, sibs). Adding 1 negative samples per sample... Added negative samples, now we have 258064 samples. Validating... ██████████|253/253 [06:31<00:00, 1.55s/it] auc: 0.685 | thresh: 0.155 ```
Log test ``` ➜ ours git:(main) ✗ guild run mtcf:test root-dir=rfiw2021/Track1 dataset-path=rfiw2021/Track1/sample0/test.txt output-dir=exp weights=exp/best.pth operation:mtcf=`guild select -Fo mtcf -Sc --max 'epoch_auc'` threshold=0.155 Refreshing flags... WARNING: cannot import flags from train_fc.py: ModuleNotFoundError: No module named 'dataset' (run with guild --debug for details) WARNING: cannot import flags from train_kv.py: ModuleNotFoundError: No module named 'dataset' (run with guild --debug for details) You are about to run mtcf:test batch-size: 1024 dataset-path: rfiw2021/Track1/sample0/test.txt device: '0' operation:mtcf: aa81ce57e78444e7880b5fa48cbeaa91 output-dir: exp root-dir: rfiw2021/Track1 threshold: 0.155 weights: exp/best.pth Continue? (Y/n) Resolving file:weights Resolving file:../rfiw2021/ Resolving file:models/insightface Resolving operation:mtcf Using run aa81ce57e78444e7880b5fa48cbeaa91 for operation:mtcf 2023-12-23 16:13:54.248687: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2023-12-23 16:13:54.251028: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 2023-12-23 16:13:54.291826: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-12-23 16:13:54.291850: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-12-23 16:13:54.292940: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-12-23 16:13:54.299159: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 2023-12-23 16:13:54.299352: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-12-23 16:13:55.072722: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Namespace(root_dir='rfiw2021/Track1', dataset_path='rfiw2021/Track1/sample0/test.txt', weights='exp/best.pth', threshold=0.155, output_dir=PosixPath('exp'), batch_size=1024, device='0', func=) Current CUDA Device = 0 Device Name = NVIDIA GeForce RTX 3090 Loaded 39743 samples from rfiw2021/Track1/sample0/test.txt. Validating... ██████████|39/39 [01:04<00:00, 1.65s/it] acc: 0.614 ```