singnet / language-learning

OpenCog Unsupervised Language Learning
https://wiki.opencog.org/w/Language_learning
MIT License
32 stars 11 forks source link

Wrong tagging in iterative grammar learning with "Gutenberg Children Books" corpus #177

Open OlegBaskov opened 5 years ago

OlegBaskov commented 5 years ago

Cluster tags and words in tagged grammar .dict and cat_tree files.
Either tagging or input parses filtering issue, OR issues in corpus preventing correct link extraction?
Jupyter notebook -- Iterative-clustering-ILE-POCE-CDS-2019-02-27.ipynbstatic html copy.
This issue is a copy of https://github.com/OlegBaskov/language-learning/issues/79 issue.

OlegBaskov commented 5 years ago

The problems start from cell 28, row 2 of the notebook (277 clusters), data -- .../GCB_LG-E-clean_dILEd_no-gen_0c_mwc=21/iteration_2:
the tagged category tree 277_cat_tree.txt.tagged should contain only tagged categories like ###aab###, however starting from line 204 categories contain non-tagged words.
The same issue is observed in all the following cells 33-36.