Open akolonin opened 5 years ago
Item 3 sample is located at http://langlearn.singularitynet.io/data/aglushchenko_parses/suffix-problem/ . The above mentioned token can be easily found in the dictionary file rule.
Looks like the problem with up.'and и up@'and is not the akolonin@Ubuntu-1604-xenial-64-minimal:/home/aglushchenko/data/parses/suffix-problem$ grep -P ".\'" dict_20C_2019-01-28_0006.4.0.dict | wc -l 1 akolonin@Ubuntu-1604-xenial-64-minimal:/home/aglushchenko/data/parses/suffix-problem$ grep -P "up.\'and" dict_20C_2019-01-28_0006.4.0.dict | wc -l 1 grep -P "up.\'and" test-corpus-06.txt.raw (dove)(,)(and)(flew)(up.'and)(into)(the)(air)(.)] grep -P ".\'" test-corpus-06.txt.raw (dove)(,)(and)(flew)(up.'and)(into)(the)(air)(.)] grep -P ".\w" test-corpus-06.txt.raw | grep -v Found | grep -v Link(the)(man)(at)(the)(other)(end)(of)(them)(..y)] (as)(her)(..y)]
@glicerico - in the version MST-parsed that you are crafting now, can we have MST-Parser configured so it is not breaking words with inner period?
@akolonin , the new tokenizer-less version of the observer and MST-parser only splits by spaces, so this should not be a problem.
Few problems:
up.'and
иup@'and
in Grammar Tester (GT).@glicerico , do you think that using the period (.) in WSD process and nod re-coding periods to ats by GL could eliminate all of the the problems and wouldn't solve other problems in MST-Parsing?