hpcaitech / ColossalAI-Examples

Examples of training models with hybrid parallelism using ColossalAI
Apache License 2.0
334 stars 102 forks source link

[example]Support ViT convergence test with ColoTensor #147

Closed ZhaoYi1222 closed 2 years ago

ZhaoYi1222 commented 2 years ago

The training setting follows this log. Log with loss and accuracy will be released after completing the training process.

[11/300] loss: 3.160 lr: 0.00103125                
[11/300] loss: 2.662 Accuracy: [10176/25088](0.406)
[21/300] loss: 2.424 lr: 0.00196875                
[21/300] loss: 2.060 Accuracy: [12845/24832](0.517)
[31/300] loss: 2.480 lr: 0.00290625                
[31/300] loss: 2.073 Accuracy: [12785/25088](0.510)
[41/300] loss: 2.344 lr: 0.0029934089730328474     
[41/300] loss: 1.962 Accuracy: [13288/24832](0.535)
[51/300] loss: 2.160 lr: 0.002966732165989418      
[51/300] loss: 1.899 Accuracy: [13801/25088](0.550)
[52/300] loss: 2.824 lr: 0.0029629487754680318     
[52/300] loss: 2.385 Accuracy: [11797/25088](0.470)
[53/300] loss: 2.544 lr: 0.00295896435798722       
[53/300] loss: 1.874 Accuracy: [13670/24832](0.550)
[54/300] loss: 2.216 lr: 0.002954779461054418      
[54/300] loss: 2.342 Accuracy: [11840/25088](0.472)

The accuracy seems to unstable between odd and even epoch.