tunib-ai / parallelformers

Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
https://tunib-ai.github.io/parallelformers
Apache License 2.0
776 stars 61 forks source link

GPT models hang on large token generation. Lower performance? #15

Open mallorbc opened 2 years ago

mallorbc commented 2 years ago

I am using a 3060 and a 3090 to split GPT models two ways including GPTJ and GPT Neo 2.7B. When generating many tokens, say 500, the model hangs and either takes a abnormal amount of time to finish or does not finish. ( I kill it) Generating 50 tokens does not have this issue.
During this issue, the 3090 memory is pinned to 100% while the 3060 stays low.

image

Subjectively, especially for GPTJ, the results, while not complete gibberish seem to be of lower quality.

mallorbc commented 2 years ago

Might this be a race condition between the two GPUs?