uber-research / PPLM

Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
Apache License 2.0
1.13k stars 202 forks source link

Performances #8

Open astariul opened 4 years ago

astariul commented 4 years ago

Thanks for open-sourcing the code !

This approach is very interesting, but I'm curious about the impact on performance (inference speed).

Is there any benchmark showing the impact on performance with different parameters ?

dathath commented 4 years ago

Thanks for the question! We have not run a detailed analysis on the inference speed, but it is slower than normal inference because of the gradient based updates to the activations. We are working on an extension that alleviates some of this, but it does get slower with an increased number of gradient updates.

erik-dunteman commented 3 years ago

(not an issue or resolution, just a note)

I'm also super grateful you've open-sourced this! It's a very creative approach to perturb the past and rerun iteratively.

I've productionized this, figured I'd share some learnings:

In short, running this setup in production is tough; you can get decent speeds (5+ words per second with smaller gpt2 models on a GPU), but concurrent calls will queue since your flask server only has one worker.

erik-dunteman commented 3 years ago

To directly answer the question, if I understand this code correctly, the performance impact is (1 + num_iterations) times greater than simply calling the model as-is. That's making the simplification assumption that the model predict function is 100% of the total inference time.