microsoft / torchscale

Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
2.97k stars 200 forks source link

AttributeError: 'EncoderConfig' object has no attribute 'decoder_layers' #43

Closed dedekinds closed 11 months ago

dedekinds commented 11 months ago
图片

.

.

Hi, I plan to reproduce the results of the WMT-17 translation task as presented in the deepnet paper. Could you please let me know what the command for running the script should be? For example, what should --arch be set to? According to the examples provided in the readme, should I run the following command?

cd examples/fairseq/
python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 train.py \
    ${PATH_TO_DATA} \
    --arch mt_base --share-decoder-input-output-embed \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
    --dropout 0.3 --weight-decay 0.0001 \
    --max-tokens 4096 --fp16  --deepnorm

However, when I add --deepnorm to the command from example, it throws an error: AttributeError: 'EncoderConfig' object has no attribute 'decoder_layers'. Could you please advise on the correct command and settings to obtain results similar to Table 1 in the paper? Thank you!

图片
shumingma commented 11 months ago

This has been fixed by the commit. https://github.com/microsoft/torchscale/commit/5356b252c43b2cba3638b0646c4f861854a4a854

Please have a try on the latest commit.

dedekinds commented 11 months ago

This has been fixed by the commit. 5356b25

Please have a try on the latest commit.

It works, thank you!