hpcaitech / FastFold

Optimizing AlphaFold Training and Inference on GPU Clusters
Apache License 2.0
557 stars 86 forks source link

add metrics for training #138

Closed Gy-Lu closed 1 year ago

Gy-Lu commented 1 year ago
  1. Add metrics for training
  2. Replace tqdm with log of metrics(not sure if it is good) and save it on disk.
oahzxl commented 1 year ago

Great. Maybe its better to add total iter num and retain fewer decimal places for loss?

Gy-Lu commented 1 year ago

Great. Maybe its better to add total iter num and retain fewer decimal places for loss?

Done, it now looks like this

[01/15/23 18:11:55] INFO     colossalai - colossalai - INFO: train.py:230 main
                    INFO     colossalai - colossalai - INFO: Training, Epoch: 0, Step: 1, Global_Step: 1, Loss: distogram=4.159 experimentally_resolved=0.693 fape=1.682
                             lddt=3.912 masked_msa=3.135 supervised_chi=0.937 violation=404.925 unscaled_loss=10.176 loss=162.822 lddt_ca=0.010 drmsd_ca=26.113
[01/15/23 18:12:07] INFO     colossalai - colossalai - INFO: train.py:230 main
                    INFO     colossalai - colossalai - INFO: Training, Epoch: 0, Step: 2, Global_Step: 2, Loss: distogram=4.159 experimentally_resolved=0.693 fape=1.543
                             lddt=3.912 masked_msa=3.135 supervised_chi=0.961 violation=393.072 unscaled_loss=10.062 loss=160.364 lddt_ca=0.009 drmsd_ca=22.707