Closed mehmetilker closed 4 years ago
@mehmetilker Is NER included in your Stanza
pipeline? If so, it is not fair to compare with stanfordnlp
, as NER is the new feature in Stanza. While our NER model can achieve SOTA results, it is featured with contextualized word embedding generated by character-level RNN, which requires significant computational resources and favors for GPU.
Can you disable the NER processor and compare it with stanfordnlp
again? Thanks!
@yuhui-zh15 There is no NER model for Turkish language and I am using 'tokenize,mwt,pos,lemma,depparse' processors. In this way NER is already disabled I guess.
@mehmetilker Can you provide the script you used? In that case, we can understand the problem more quickly!
@yuhui-zh15 my mistake. I have found the reason for disk io problem, nothing related with stanza. Memory increase still there. I will try to replace with sample. Until then I am closing the issue.
@yuhui-zh15 my mistake. I have found the reason for disk io problem, nothing related with stanza. Memory increase still there. I will try to replace with sample. Until then I am closing the issue.
Can you share the solution you found? I am experiencing the same issue.
@mehmetilker @DesiPilla in general, if you're seeing memory errors associated with stack traces that look like model loading, that probably means your memory is too small to load all of the Stanza models you need at once. If you're in a VM or docker environment, increasing the memory limit would help; otherwise you can also try to process the text one step at a time: stanza processors have flags like tokneize_pretokenized
and depparse_pretagged
that take the output from previous stages without recomputing them. See the documentation for processors for more details!
Describe the bug After replace stanfordnlp with stanza I am experiencing disk usage & memory increase. Additionaly CPU usage looks more stable.
Expected behavior As I changed old library with the new one with only PyTorch upgrade (1.4 > 1.5) I expect little or no change
Environment (please complete the following information):
Additional context There is a service continuously parsing some text and after some time later throws exception. I am using stanza with spacy_stanza (previously spacy_stanfornlp), when I increase batch size (pipe) I experience problem more often.
You can see changes by the red line: