tarsqi / ttk

Tarsqi Toolkit
Apache License 2.0
25 stars 10 forks source link

Memory Error with S2T module #25

Closed sanjibksaha closed 7 years ago

sanjibksaha commented 8 years ago

wsj_0585.txt

While processing this file (the file is renamed from .tml to .txt to upload here) with the TTK pipeline python (latest version), we encountered an S2T error. Here is the output from the console

reading parameters ... tagging ... 1000ERROR: S2T error: <type 'exceptions.MemoryError'> finished.

The resulting output file was more than 500 MB for this input file. For comparison, we had an output file of this python pipeline of 32 KB (input file: wsj_0169.tml).

marcverhagen commented 8 years ago

This is caused by the tagger hanging on this input. Need to check whether this was caused by the wrapper simplifications introduced a few weeks ago.

sanjibksaha commented 8 years ago

wsj_0585.txt

I am really sorry that I uploaded the wrong file. It was the output file... The correct input file is attached with this reply.

marcverhagen commented 8 years ago

I take back what I said before, this is NOT caused by the tagger. When tried to replicate this issue I did run into a tagger, but when I fixed that problem (which apparently does not occur for everyone) I did get the same error and it is indeed an infinite loop in S2T.