Closed andrePankraz closed 1 month ago
Would you provide the complete stack trace please?
Ultimately I would like to be able to recreate the problem, but the following doesn't OOM on a 3090, nowhere near using up all my RAM:
import stanza
pipe = stanza.MultilingualPipeline(lang_id_config={ "langid_clean_text": True,
"langid_lang_subset": ["de", "en"] },
lang_configs={ "de": {"processors": "tokenize,mwt", "verbose": False},
"en": {"processors": "tokenize", "verbose": False}})
text = "\n\n".join("This is a sample text %d" % i for i in range(10000))
# discarding the result each time
result = pipe(text)
text = "\n".join("This is a sample text %d" % i for i in range(10000))
result = pipe(text)
text = " ".join("This is a sample text %d" % i for i in range(10000))
result = pipe(text)
couldn't reproduce either, closing. thx
Describe the bug I have some out of mems with 35 GB processes, stanze could be tracked down as reason.
To Reproduce Steps to reproduce the behavior:
self.nlp = stanza.MultilingualPipeline( model_dir=f"{get_from_env('model_dir', 'MODELS_FOLDER', 'data/models/')}stanza", lang_id_config={ "langid_clean_text": True, "langid_lang_subset": ["de", "en"], }, lang_configs={ "de": {"processors": "tokenize,mwt", "verbose": False}, "en": {"processors": "tokenize", "verbose": False}, }, use_gpu=False, )
Expected behavior No out of mem ;) For instance by really using batching in the pipelines?!
The classes implement some batch initializing params, but don't seem to do anything with them (or i cannot see it). E.G. MultilingualPipeline.init has a param ld_batch_size=64, which isn't used anywhere in this class (e.g. for initializing sub processors). The processor LangIDBiLSTM also has self.batch_size = batch_size with default 64 - but again, it doesn't seem to be used anywhere.
Do I have wrong expectations? OK; I can batch myself, but it doesn't seem to be the intension of this wrapper (and it shpouldn't) or I called call the LSTM just directly without all this wrapper stuff.