nlp-uoregon / trankit

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Apache License 2.0
724 stars 99 forks source link

Memory leak in Pipeline() on a CPU #51

Open navotoz opened 2 years ago

navotoz commented 2 years ago

Hello,

I've initiated the model like so: nlp = Pipeline('english', gpu=False, cache_dir='./cache') Than call it by using: with torch.no_grad(): for idx in range(10000): nlp.lemmatize('Hello World', is_sent=True). When running the code, the RAM memory slowly fills.

I attached a graph of the memory filling up. image

I'm using python3.7, trankit=1.1.0, torch=1.7.1.

Thank you!

olegpolivin commented 2 years ago

I confirm: when running on CPU there is an increasing memory consumption. @navotoz , could you, please, tell me whether you have been able to solve this issue?

Dielianss commented 2 years ago

Hi @navotoz , I confirm this issue also appears in Python 3.7, trankit 1.1.1, torch 1.8.1+cu101

navotoz commented 2 years ago

Hi @Dielianss @olegpolivin Thanks for the comments. We maneged to mitigate this issue by running inference in a docker and restarting it every predefined interval. This is not a real solution to this issue, but at least we can work with the model.