microsoft / torchscale

Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
3.01k stars 202 forks source link

How to test the model #106

Open ReloJeffrey opened 5 months ago

ReloJeffrey commented 5 months ago

The codebase has provided the training code. But how the reproduce the eval result in the paper 'DeepNet: Scaling Transformers to 1,000 Layers'. Could you please provide the code to reproduce the results in table 6 and table 7 of the paper 'DeepNet: Scaling Transformers to 1,000 Layers'.