lojban / cll

Complete Lojban Language Chunked
http://lojban.org/publications/cll/
Other
176 stars 49 forks source link

tosmabru test description is unclear #466

Open vpbroman opened 3 years ago

vpbroman commented 3 years ago

In section 4.11 the "tosmabru" test in step 5 of the lujvo-making algorithm is confusing. The serious bug in step 5 found in the printed book was fixed in this electronic edition, but it still needs some love. I rewrote it as follows.

5) Test all forms with one or more initial CVC-form rafsi — with the pattern “CVC ... CVC + X” — for “tosmabru failure”. In order to fail, X must either be a CVCCV long rafsi that happens to have a permissible initial pair as its consonant cluster, or be something which has caused a “y”-hyphen to be inserted between the preceding CVC and X in step 4b. The test is as follows: 5a) Examine all the C/C consonant pairs up to the first "y"-hyphen, or up to the end of the word in case there are no "y"-hyphens. These consonant pairs are called “joints”. 5b) If all of those joints are permissible initials, then the trial word will break up into a cmavo and a shorter brivla, so we need to add a “y”-hyphen at the first joint. If not, the word will not break up, and no further hyphens are needed.

Note that the “tosmabru" test, which affects hyphenation after the first rafsi, cannot be performed until after hyphenation to the right under step 4 has already been determined.

vpbroman commented 3 years ago

About the Note on one-pass hyphenation algorithms: doing the hyphenation in one-pass is painful and confusing because of the state that must be maintained in the loop, and it wouldn't save any time anyway, unless you have billions of rafsi in a list that swaps in and out. hah. So we don't bother analyzing what no one should try.