singnet / language-learning

OpenCog Unsupervised Language Learning
https://wiki.opencog.org/w/Language_learning
MIT License
31 stars 11 forks source link

Segfaults in LG 5.5.1 caused by rule line length? #234

Closed akolonin closed 5 years ago

akolonin commented 5 years ago

Check grammars for ALE50, lines 60-62, if the segfaults in LG caused by rule line lenght greater than particular threshold. Based on that, decide which of three MST-parsing runs have to be re-tried (two failed to open grammar, aother took 200+ hours to complete)

For starters, need to list max lengths of rules (overall, or words/disjuncts separately) for dict files in lines 57-63 for ALE-50 in "Parses" tab: https://docs.google.com/spreadsheets/d/1o-4acGPxkMIS6-xJDxwjqDWAwIt8qx2xPemu14IZRaU/edit#gid=963717716

alexei-gl commented 5 years ago
ALE50 dictionary line length

CORPUS  Maximum rule length
abs     2008003 bytes (1960.94 Kbytes)
any     4686148 bytes (4576.32 Kbytes)
lge     1082744 bytes (1057.37 Kbytes)
rnd     1616526 bytes (1578.64 Kbytes)
seq     373869 bytes (365.11 Kbytes)
w10     4649181 bytes (4540.22 Kbytes)
w6r     3757744 bytes (3669.67 Kbytes)

It appeares that link-parser failes when rule length is crossing 4 Mb boundary.