Closed Wenhan-Tan closed 1 year ago
Hi @Wenhan-Tan,
Generic checkpoint loading has been implemented for only a subset of models in GH-2547:
The reasoning behind not implementing GPT2 checkpoint loading is given that it's a smaller model, it'll not benefit as dramatically from this feature.
Thanks for raising this issue, we'll work through several things to help with this in the future:
Hi @lekurile ,
Thank you for replying! I understand GPT2 is a smaller model. I'm trying to use a larger version of GPT2 like GPT2-xl which has 1.5B parameters. This model will benefit from the feature. If later GPT3 is released, it has 175B parameters. This feature will be really useful as well. Do you have a timeline of when GPT will be supported? If not, please let me know what else I can do to make this work.
Thanks a lot!
Hi @Wenhan-Tan,
I've completed a PR (GH-2792) adding explicit error reporting in cases where meta tensor checkpoint loading is attempted on models that don't support the feature.
As far as GPT2 support goes, we don't have immediate plans, but if/when larger GPT variants are released, we'd prioritize adding support for this. One thing to bear in mind is that the meta tensor approach is specifically targeting loading models that cannot fit on a single GPU without tensor parallelism. We're not aware of GPT2 models that currently have that limitation.
However, we certainly encourage you to feel free to add a PR adding meta tensor support for GPT2 and appreciate any efforts to extend the support/functionality of DeepSpeed. 😃
I'll mark the issue as resolved in the meantime.
Thanks, Lev
I was trying to load
GPT2
from checkpoint for inference but gotNotImplementedError
during replacing policy.To Reproduce:
I ran the script with this command:
deepspeed --num_gpus 1 script.py
And got the following error:
Then I used the same script but changed the model to
GPTJ
and it executed successfully.I looked at the source file here: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/module_inject/replace_policy.py#L487 And found out that
HFGPT2LayerPolicy
does not haveget_param_names()
being implemented. I checked other policies and realized thisget_param_names()
is implemented only for a few models. Is there a reason whyGPT2
doesn't have it?Then I tried to implement it for
GPT2
by myself like this:But got another error below:
Not sure what I can do next. Can someone help me even if it's a temporary solution? Thanks!