pan-x-c / EE-LLM

EE-LLM is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs).
Other
44 stars 4 forks source link

EE on a Megatron checkpoint #3

Closed iniverno closed 6 months ago

iniverno commented 7 months ago

Describe the bug Trying to insert the exit layers on a checkpoint previously saved with Megatron. The conversion script is expecting many EE-specific arguments to be present in the checkpoint.

To Reproduce Using the default parameters in the conversion script, it fails trying to access checkpoint_args.exit_layer_nums

pan-x-c commented 7 months ago

Sorry, the current version of the add_exit_layers.sh script is only verified on checkpoints generated by EE-LLM. Suppose you have a checkpoint generated by the official Megatron-LM. In that case, you need to determine whether the args exist before accessing and use the default value instead if they do not exist.

If you need it urgently, you can modify tools/checkpoint/checkpoint_converter.py as described above. We will also test the script and fix this bug within this week.

pan-x-c commented 7 months ago

Please check whether the branch fix/pxc/add_exit_layers solves the problem