possible cognate review on a few items?

HedvigS commented 4 months ago

Hi UraLex team. Thank you for a lovely dataset!

I noticed a few forms in different languages that are identical to each other and pertain to the same parameter, but have been put in different cognate classes. Some of them are very short, so probably cases of different forms being reduced down to the same. Some of them may be loans as well, I don't know . "muž*" I think is a loan perhaps, so that might explain that. But "tuman" for "fog" maybe should be looked over once more?

Table of forms for the same parameters and how many cognate classes the form occurs in.

Form Parameter_ID n 1 da 103 2 2 i 103 4 3 ja 103 3 4 mam 154 2 5 mužik 145 2 6 mužik 253 2 7 onu 298 2 8 tolʹko 254 2 9 tuman 133 2

MervideHeer commented 3 months ago

Hi, Thank you for the comment and feedback. Points like this are valuable so we can critically examine and develop our dataset further.

The items listed are indeed all borrowings as you pointed out. The different cognate groups are intentional but definitely worth double-checking especially if they are confusing.

The multiple cognate classes are mostly caused by parallel borrowing which is quite a tangle. As the cognate column represents vertical relationships, more recent borrowings acquired independently need their own sets. Almost all the words are Russian borrowings which are relatively recent and usually noted in our source literature too. The different groupings are intended to also reflect that a word is likely borrowed from the same donor language at a different time point into different recipient languages. Unfortunately, in source literature not all languages are discussed equally and the stratification of some borrowings is often not very detailed. We have attempted to capture this with the chosen donor language labels and the borrowing certainty tags (clear/probable/possible) also included in the datasheet. For some languages, it is very likely that a word comes from the same origin as in its sister languages but in literature sometimes even closely related languages are discussed with different degree of certainty. This way UraLex is tied to and reflects the state of research too. In addition, some languages lack a strong standard and it has been necessary to include some alternate forms which are very similar but are put in different cognate sets. We are hoping to resolve these compromises in the future. A few comments on the items: In the case of ID103 ‘and’ –words, parallel borrowing mainly from Russian (‘I’ and ‘da’). The grouping of the ‘ja’ word derives from the complex contact situation between Finnic, Saamic and (North) Germanic languages. While the Finnic word is a Germanic borrowing, the Saamic languages have often borrowed it from a Finnic source. For some words, it is also possible that the word has been mediated via a Scandinavian language. ID154, 145, 253, 245, 153, 254, 133 independent borrowings from Russian likely at different times or at least different geographical locations. Meanings 145 and 253 have the same parameter mužik because for some languages it fills the ‘man’ and ‘husband’ slots. ID298 ‘uncle’ appears twice, the first being the Estonian word for the meaning belonging to a cognate set with its cognates. The second one is an internal borrowing and is given its own set.

For the upcoming Uralex version, we have corrected and improved the separation of parallel borrowings when possible so there will be more classes. I hope this has clarified the situation a bit. We are always happy to hear about new literature and ideas that help to make Uralex more accurate and usable.

HedvigS commented 3 months ago

Thanks @MervideHeer ! That sounds great. I'm happy to close the issue if that's ok with you.

MervideHeer commented 3 months ago

Sure! We can close the issue.

lexibank / uralex

possible cognate review on a few items? #19