Open avivbrokman opened 11 months ago
Ok I'll do some experiments too and get back to you. Just to double check, you are giving the same prompt to GPT-2 and BioMedLM and running generate and those numbers are the ratio between the 2 models?
Just this week I have been spending a lot of time working on BioMedLM's generative abilities for downstream tasks ... I actually feel it is most useful for scenarios like reading a PubMed abstract and printing out a list of relations derived from the abstract for instance ...
BioMedLM out of the box should just literally be running the same code as GPT-2 since it is just a GPT-2 model with different weights and different tokenizer ... it has a smaller vocabulary than GPT-2 ... we could also compare to GPT Neo 2.7B ...
And what exactly are the input --> outputs ? Are BioMedLM and GPT-2 XL producing text of similar length or is there a difference in average output length? I don't think setting max_length
necessarily determines the average length of outputs, so if one model had a tendency to print out longer responses to inputs it would possibly take longer?
Just to double check, you are giving the same prompt to GPT-2 and BioMedLM and running generate and those numbers are the ratio between the 2 models?
Yes, to both
I actually feel it is most useful for scenarios like reading a PubMed abstract and printing out a list of relations derived from the abstract
I laughed when I read this, because I'm doing this exactly. I just wanted to provide a minimal example.
BioMedLM out of the box should just literally be running the same code as GPT-2 since it is just a GPT-2 model with different weights and different tokenizer
This is what I expected—and why I'm confused about the difference in speed.
Are BioMedLM and GPT-2 XL producing text of similar length or is there a difference in average output length?
For my minimal example, they are producing lengths within 2 tokens of each other, so I don't think sequence length accounts for it (also my code prints out number of generated tokens). I'm guessing this is a special tokens difference.
I am trying to use BioMedLM for generation, but I find that it is very slow at generation for long sequences. Training occurs at a normal speed. I wrote a minimal program (below) to reproduce this, comparing it to GPT-2 (1.5B parameters) and Flan T5-XL (3B parameters) for comparison. I varied the maximum generation length value, and estimated the ratio of the durations of the decoder models (BioMedLM divided by GPT-2):
1024 tokens: 5.9 512 tokens: 3.2 256 tokens: 1.9 128 tokens: 1.3 64 tokens: 1.01
Anecdotally, the generation speed is similar to that of Flan UL2, a 20B parameter model.
I'd like to fix this—I don't know if the issue is in the the BioMedLM code, my software/environment versions/settings, or my hardware A100-80GB.