minimaxir / gpt-2-cloud-run

Text-generation API via GPT-2 for Cloud Run
MIT License
315 stars 86 forks source link

Reduce memory consumption to prevent errors due to container OOM #5

Closed minimaxir closed 5 years ago

minimaxir commented 5 years ago

Containers seem to go OOM after ~10 generations, despite garbage collection. Loading the model takes up ~1.5GB so hitting the ceiling is not surprising, but there should be a way to control the leaks.

minimaxir commented 5 years ago

The current implementation (reload model after 8 generations) appears to avoid OOMs.

deepbluesea commented 5 years ago

this will reduce the memory consumption by a lot tensor2tensor.utils.adafactor.adafactoroptimizer