Open alexei-gl opened 5 years ago
Dictionary version check added to LG-based parser. If version of LG and dictionary one tries to use for parsing mismatch, exception is generated. In grammar learner code LG version check is also restored.
Still happens in a situation where dictionary version corruption is unlikely. In a sequence of 5 tests with same settings 4 tests pass, while the 5th crashes. The corpus is extracted from reference file in all the 5 tests.
Static html copy of the Jupyter notebook -- GCB-LG-E-clean-ALE-MWC=1-MSL=10-2019-02-17_LGParseError.html, error "LGParseError: Number of sentences in corpus and reference files missmatch. Reference file '/home/obaskov/94/language-learning/data/GCB/LG-E-clean/GCB-LG-English-clean.ull' does not match its corpus counterpart 104341 != 104340" in cell 15.
The faulty grammar directory -- GCB-LG-E-clean-ALE-MWC=1-MSL=10-2019-02-17_LGParseError/GCB_LG-E-clean_cALWEd_no-gen_20c/
Another sample -- GCB-LG-E-clean-MWC=D1-MSL=5-2019-02-17_LGParseError.ipynb, the same cell 15 with 20 clusters test.
Static html copy of the notebook -- GCB-LG-E-clean-ALE-MWC=1-MSL=5-2019-02-17_LGParseError.html, link to data in the 1st cell of the notebook.
@OlegBaskov I made a new PR with fixes. Please, update your code base by typing git pull
.
@alexei-gl - please test the issue with the Oleg's configuration reported and close the issue if it works
@alexei-gl - can we close this now?
Grammar tester reports: "Number of sentences in corpus and reference files missmatch" when dictionary was generated for wrong Link Grammar version. This happends when link-parser is unable to find words in supplied dictionary and UNKNOWN-WORD rule contained in the dictionary is written with or without '<>' (depending on LG version) which makes the rule worthless and produces non fatal link-parser errors. Some sentences are left unparsed because of the unknown words, so the number of parsed sentences does not match the number of sentences in reference parses.