gregorio-project / hyphen-la

Latin ecclesiastic hyphenation patterns and resources
http://gregorio-project.github.io/hyphen-la/
MIT License
24 stars 5 forks source link

Word lists #38

Open wehro opened 6 years ago

wehro commented 6 years ago

There is one duplicate in wordlist-liturgical.txt: eucharistia. As far as I can see, the entry a-bi-ens is wrong. This is a participle of ab-ire. There are eight accented forms in the file, all of them beginning with trans.

There is also one duplicate in wordlist-liturgical-accents.txt: transpadáneus. There are some entries with three or more syllables not having an accent in the file: abstergo adieuntis (strange form, does it exist?) altertra astasti astastis compinguescerent cumscribillo discrepentia displuvia distrivsti distrivstis elanguescam elanguescerent epithalamus eschato euhias exsanguescerent horreus impinguescerent impinguescetis inexstinguibilis inexstinguibiliter interstinguant languescerent languoris linguacioris linguosioris longivus obfidire obiex perieam perieamus perieant perieas perieatis periee (also strange) perieim perieimus perieint perieis perieit perieitis perieo perieunt periur perunguerent pinguescerent præaudio præiens præobturans præstruo relanguescerent respire sanguinolentus satisaccipere semetipsum sorbuis substinguitur superescit superimpedens suscribere

fradec commented 6 years ago

Thank you for your careful proofreading. Duplicates like eucharistia and transpadáneus are often due to the presence of the word with a capital letter, but a single word is amply enough! I try to banish duplicates when I complete the word list, but many times, some escape me!

It's ok for ab-i-ens. I dream to proofread all the words beginning by ab-… I will correct at least this one.

I'm sorry, but I don't understand when you say:

There are eight accented forms in the file, all of them beginning with trans.

For the unaccented words in the dedicated file, I think it will be necessary to proofread them one by one, and to suppress those which have no reason to be there. I will do this as soon as possible.

wehro commented 6 years ago

Thank you for your comments and your indefatigable work.

As far as I can see, the file wordlist-liturgical.txt is intended not to contain accented characters. But the following words are present there and should perhaps be moved to wordlist-liturgical-accents-txt: trans-é-re trán-se-re trans-e-ré-re tran-sé-re-re trans-e-ré-ris tran-sé-re-ris trans-é-ris trán-se-ris

fradec commented 6 years ago

Yes you're right. I did not add any documentation on this precise topic. These are homographs that have a different hyphenation depending on the place of the accent (this is well documented in the homograph section of the doc). As the model without accent can not agree for both cases, there is inevitably an error message if the accented words are put in the dedicated file. By splitting it into the file that gives the correct hyphenation, I wanted to avoid that it falls into oblivion.

To do well, I would have to do the same for deuteris and supereris who are in the same case. For these two words indeed, the solution followed until now was to remove them from the list to avoid an error message.

wehro commented 6 years ago

OK, I see.

Four of the eight forms mentioned above seem to be derived from tran(s)-sero, but what is the origin of trans-é-re, trans-e-ré-re, trans-e-ré-ris, and trans-é-ris? I don't see how these can be derived from trans-eo.

I found some more duplicates in the word list: castricius, nympheum, pœniceus, angulus, and Daphne.

wehro commented 6 years ago

There are several compounds of iacio in the list, e.g. ab-icio with its medieval variant ab-jicio. I think it is not good to write ab-iicio and the like. As far as I know, the double i variant does not exist.

wehro commented 6 years ago

There are many forms of the rare verb abstruo in the list. I suppose the radix is struo and not truo, so I suggest do hyphenate ab-stru-o instead of abs-tru-o.

fradec commented 6 years ago

Four of the eight forms mentioned above seem to be derived from tran(s)-sero, but what is the origin of trans-é-re, trans-e-ré-re, trans-e-ré-ris, and trans-é-ris? I don't see how these can be derived from trans-eo.

Those are passive forms of transeo. See here for more details if you want.

I found some more duplicates in the word list: castricius, nympheum, pœniceus, angulus, and Daphne.

Ok, I will clean all soon.

There are several compounds of iacio in the list, e.g. ab-icio with its medieval variant ab-jicio. I think it is not good to write ab-iicio and the like. As far as I know, the double i variant does not exist.

I think so too. It would be ab-icio and ab-jicio.

There are many forms of the rare verb abstruo in the list. I suppose the radix is struo and not truo, so I suggest do hyphenate ab-stru-o instead of abs-tru-o.

I agree to the fact that abstruo is a rare verb. But the correct hyphenation is abs-truo, it's a flexed form of abs-trudo. That's why I put abs-truo.

fradec commented 6 years ago

There are several compounds of iacio in the list, e.g. ab-icio with its medieval variant ab-jicio. I think it is not good to write ab-iicio and the like. As far as I know, the double i variant does not exist.

For this particular point, it may be useful to open a specific issue I think.

wehro commented 6 years ago

Four of the eight forms mentioned above seem to be derived from tran(s)-sero, but what is the origin of trans-é-re, trans-e-ré-re, trans-e-ré-ris, and trans-é-ris? I don't see how these can be derived from trans-eo.

Those are passive forms of transeo. See here for more details if you want.

Are you sure the forms mentioned there are correct? My grammar disagrees with those forms in several cases. It does not give a full scheme, but states the passive forms ad-eor, ad-iris, ad-itur, ad-ibar, ad-ibor, ad-ear, ad-irer. It also states, that e is used before vowels (e.g. eatur) and i before consonants (e.g. itur).

fradec commented 5 years ago

I can't say that I'm sure as if I had studied the question in depth, but I trust this source that is usually correct.Indeed, it gives only the existing forms usually, not those that an automatic lemmatizer could deduce from algorithms. Insofar as these are passive forms and a priori infrequent, I did not see fit to dig deeper into the question.

I do not know what your grammar is, probably the authors have good reason to mention what you say. Note however, that collatinus does not give passive forms for the present, the future and the imperfect for adeo nor for transeo. For this verb, moreover, he gives no passive form at all.

I wanted to be as complete as possible by including these passive forms that may be disputable, in case… But one solution could be to remove them.

fradec commented 5 years ago

I proofread the list of unaccented words. Everything is fixed, including the corresponding patterns as needed.

Note that I deleted 4 words that I did not find anywhere in the consulted dictionaries:

One word had two syllables and so an accent is useless: obiex (for objex).

I left out words that begin with per-, because they should be proofread more systematically, as the line 206 of the doc, which seems dubious. I open a new ticket on this subject.

wehro commented 5 years ago

I have the impression that those passive forms have been created automatically and then corrected by hand for the third person singular only (ire without prefix only has an impersonal passive). If you compare the active and the passive scheme of transire given by this internet source, the change of the vowels does not seem to be plausible: indicative present active transimus, but passive transemur; subjunctive imperfect active transirem, but passive transerer.

It is good to be as comprehensive as possible, but we should be careful not to propagate non-existing word forms. Internet sources tend to be incomplete or faulty, even if they may give good results in other cases.

I suggest to replace trans-é-re by trans-í-re, trans-e-ré-re by trans-i-ré-re, trans-e-ré-ris by trans-i-ré-ris, and trans-é-ris by trans-í-ris.

fradec commented 5 years ago

I suggest to replace trans-é-re by trans-í-re, trans-e-ré-re by trans-i-ré-re, trans-e-ré-ris by trans-i-ré-ris, and trans-é-ris by trans-í-ris.

I agree. This solution is consistent with your source, and in addition removes homographs.

wehro commented 5 years ago

There are many forms of the rare verb abstruo in the list. I suppose the radix is struo and not truo, so I suggest do hyphenate ab-stru-o instead of abs-tru-o.

I agree to the fact that abstruo is a rare verb. But the correct hyphenation is abs-truo, it's a flexed form of abs-trudo. That's why I put abs-truo.

How can this be an inflected form of abs-trudo? Where is the d gone? The Gaffiot has two separate entries for abstrudo and abstruo. The meaning is similar, but not the same. In other dictionaries, I could not find abstruo at all. Its etymology seems to be quite unclear to me.

wehro commented 5 years ago

Epiphania is a duplicate.

wehro commented 5 years ago

There are many forms of the rare verb abstruo in the list. I suppose the radix is struo and not truo, so I suggest do hyphenate ab-stru-o instead of abs-tru-o.

I agree to the fact that abstruo is a rare verb. But the correct hyphenation is abs-truo, it's a flexed form of abs-trudo. That's why I put abs-truo.

How can this be an inflected form of abs-trudo? Where is the d gone? The Gaffiot has two separate entries for abstrudo and abstruo. The meaning is similar, but not the same. In other dictionaries, I could not find abstruo at all. Its etymology seems to be quite unclear to me.

I found an article about abstruo in the Archiv für lateinische Lexikographie und Grammatik, a series that appeared in the late 19th century to prepare the Thesaurus linguae Latinae. The author argues that abstruo should not appear in the dictionary, because the only instance is a text by Tertullian, where it is probably a misspelling of abs-trudo. So my theory of deriving it from ab-struo has to be rejected and I agree with your hyphenation abs-truo.

fradec commented 5 years ago

Ok, so it would be good to clean the word list, I see that I included abstruo from collatinus-web.

I reopen this issue as a memento.

wehro commented 5 years ago

I am not sure, how those “ghost words” shall be treated. abstruo is certainly not the only one. For example, when I looked for the etymology of prodicius/proditius (both listed in Gaffiot) in ThLL, I did not find proditius and for prodicius it was stated, that it is a faulty conjecture of a certain editor in a certain text.

Ghost words will never occur in modern text editions, but their hyphenation might be needed when old text editions or dictionaries are retypeset, as it was the case for the Gaffiot 2016. Some of them, who arose early enough, may even occur as original words in medieval texts, e.g. here for abstruere. So it might be safer to keep them.