Closed joeyism closed 4 years ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I'd also be interested in this. The current distilgpt2 is great for use-cases that need cheap/fast compute, but distilled versions of the larger gpt2 models (medium, large, xl) would also be super useful. For example, I am able to fit up to gpt2-large on my GPU, but I'm unable to fit gpt2-xl, which means I can't use it. If there was a distilled version of gpt2-xl which was smaller, that might make it usable for more people.
Are there any plans to distill any larger versions of gpt2?
Thanks!
Yes we can probably work on that. There is a bit of work + exploration to do: it is possible that we'll have to use model parallelism tricks to be able to train it in a reasonable time (I haven't checked yet). Applying the distillation to gpt2-xl the way we did for distilgpt2 (same ratios) would still result in a model that is bigger than gpt2-medium (24L, 1600 hidden dim). Would that fit your use-case?
(sorry for the delayed answer, I don't usually check issues without being pinged/tagged).
Applying the distillation to gpt2-xl the way we did for distilgpt2 (same ratios) would still result in a model that is bigger than gpt2-medium (24L, 1600 hidden dim). Would that fit your use-case?
Yes, if we could squish the performance of gpt2-xl into something sized between gpt2-medium and gpt2-large, that would be really useful!
Yes we can probably work on that. There is a bit of work + exploration to do: it is possible that we'll have to use model parallelism tricks to be able to train it in a reasonable time (I haven't checked yet). Applying the distillation to gpt2-xl the way we did for distilgpt2 (same ratios) would still result in a model that is bigger than gpt2-medium (24L, 1600 hidden dim). Would that fit your use-case?
(sorry for the delayed answer, I don't usually check issues without being pinged/tagged).
Even a distilgpt2-large would work for my use case
I am also interested in a distilled version of the larger models. For our use-case, this would go a long way to improving cost/performance/feasibility.
Bumping this - any word on availability of the medium/large distilled models ?
Bumping this - any word on availability of the medium/large distilled models ?
I am currently working on it! :)
any news on this?
Any news on this ? 😊
I would be extremely interested in having GPT2-XL distilled to the size of GPT2-L or smaller. Consumer-grade GPUs currently top out at around 8GB VRAM, which is enough to run inference using GPT2-L but is not enough for GPT2-XL. Unless you can find a beefier GPU than that, it will only become possible to efficiently run GPT2-XL on a desktop PC when someone trains a distilled model.
hey, any news on this?
Plans for distilgpt2-medium and distilgpt2-large
Motivation
While distilgpt2 is useful, I was wondering if there are any plans to create a distilgpt2-medium and distilgpt2-large. I'm also wondering how the result of distilgpt2-medium compare to gpt2, and distilgpt2-large compare to gpt2-medium, in size and performance.
Maybe it's not even worth it to have those pretrained, if distilgpt2-medium is larger than gpt2 and perform worse.