Closed akolonin closed 5 years ago
ALE50 dictionary line length
CORPUS Maximum rule length
abs 2008003 bytes (1960.94 Kbytes)
any 4686148 bytes (4576.32 Kbytes)
lge 1082744 bytes (1057.37 Kbytes)
rnd 1616526 bytes (1578.64 Kbytes)
seq 373869 bytes (365.11 Kbytes)
w10 4649181 bytes (4540.22 Kbytes)
w6r 3757744 bytes (3669.67 Kbytes)
It appeares that link-parser
failes when rule length is crossing 4 Mb boundary.
Check grammars for ALE50, lines 60-62, if the segfaults in LG caused by rule line lenght greater than particular threshold. Based on that, decide which of three MST-parsing runs have to be re-tried (two failed to open grammar, aother took 200+ hours to complete)
For starters, need to list max lengths of rules (overall, or words/disjuncts separately) for dict files in lines 57-63 for ALE-50 in "Parses" tab: https://docs.google.com/spreadsheets/d/1o-4acGPxkMIS6-xJDxwjqDWAwIt8qx2xPemu14IZRaU/edit#gid=963717716