lingpy / evaluation-paper

Annotating Cognates in Phylogenetic Studies of South-East Asian Languages
GNU General Public License v3.0
1 stars 0 forks source link

one more column of compound word structure #41

Closed Wu-Urbanek closed 2 years ago

Wu-Urbanek commented 3 years ago

Please add a column in the Edictor of compound structure.

LinguList commented 3 years ago

Done, check http://lingulist.de/edictor/links/liusinitic.html

Wu-Urbanek commented 2 years ago

Notes to the "COMPOUNDS" column: A = adj. N = noun N2 = repeated N Gen. = genitive Suf. = suffix Pron. = pronoun V = verb Conj. = conjunction Prep. = prepositions Particle = ptcl.

LinguList commented 2 years ago

Can't we make this a bit prettier?

Gen. -> G, Suf. -> S., Conj. -> C Prep. -> P Particle -> Pt

Wu-Urbanek commented 2 years ago

I spot some typos in the annotation and the word forms. I will correct those and leave notes.

Wu-Urbanek commented 2 years ago
Part of speech Times
V 1382
S 384
N 2765
N2 46
P 43
A 875
G 13
Adv. 20
C 45
Pron 257
Pt 8
CLS 23
? 8
8
Adv 21
Pref 10
LinguList commented 2 years ago

Thanks, how do you interpret the results in the table?

Wu-Urbanek commented 2 years ago

Corrected the typo. There are two words in the concept of "how" : 采翘 (Jixi) and 纳亨 (Suzhou) are multisyllabic words, they cannot be separated. Therefore, "Adv Adv' " is used. N2 means duplicated morpheme (like 星星). Pt is "particle", I think they can be merged with Suffix.

Part of speech Times
V 1382
S 385
N 2773
N2 46
P 43
A 875
G 13
Adv 42
C 45
Pron 257
Pt 8
CLS 23
? 4
Adv' 2
Pref 10
LinguList commented 2 years ago

I'd suggest to use Adv2 for consistency. The "?" cases, what are they?

Wu-Urbanek commented 2 years ago

"who" Wenzhou: 什 "呢" 人 "husband" Guangzhou: "先" 生 "here" Loudi: 这 嗒 里 "下" "live(alive)" Guilin: 活 個 "囗"

LinguList commented 2 years ago

呢 is a particle 先 is an adjective in Ancient Chinese 下 is a noun according to Chinese grammar

is 囗 kǒu or just a character indicating it is a blank and we do not know the běnzì?

Wu-Urbanek commented 2 years ago

I can ask Yunfan if he knows about this 囗, it is not shown in Liu's dictionary, and I am not sure if there is other 桂林話 dictionary I corrected the rest.

LinguList commented 2 years ago

x w o ²² + k ɤ ⁰ + t e ³³

this is the form, so it is "de", a particle!

Wu-Urbanek commented 2 years ago
Great! I have corrected it. The new table is Part of speech Times
V 1382
S 385
N 2774
N2 46
P 43
A 876
G 13
Adv 42
C 45
Pron 257
Pt 10
CLS 23
Adv2 2
Pref 10
LinguList commented 2 years ago

Okay, I think we have a tiny problem with the analysis now, as it is not necessarily what the reviewers demanded, I am afraid.

LinguList commented 2 years ago

Compare "ash", which has N N in all cases, vs. "fruit", which has A N. That both are distinguished is correct, but in 水 果 is not an adjective, but a noun, which serves as a modifier here. An adjective is also a modifier, but itself it has no concrete meaning and cannot be augmented with the particle de in Chinese, you can say: "hóng de hěn", but not "shǔi de hěn". So The annotation should be something indicating that the first noun modifies the second one, so it would be NM N ("noun modifier") or similar. The reviewer asked for this kind of analysis as the idea was that one could automatically derive the salient part by e.g. taking the noun that was modified.

LinguList commented 2 years ago

A good starting point for the different nouns in Mandarin Chinese (and they will also hold for the dialects in almost all parts) is Packard 2000 "morphology of Chinese", and I am sure, there one finds some classification that could be used. I do not know if I asked you to consult this before or mentioned it somehow? In any case, I assumed that it was clear that the reviewer referred to the derivational syntax, as one calls it also, which one would have to transcribe here.

LinguList commented 2 years ago

So I am afraid we need to come up with a concrete classification scheme first and then apply this to the data. Sorry that I did not see the problem earlier.

LinguList commented 2 years ago

@MacyL, do you think you could come up with a first schema following Packard 2000 to discuss with me? In fact, your work is not for nothing, as one can use it to only search for those ambiguous cases and annotated them later, once the schema has been determined.

LinguList commented 2 years ago

Chinese words may also be characterized in terms of the type of modification relationship that obtains between morphemes, or, in other words, ‘what modifies what’. The modification structure can take a juxtapositional, ‘flat’ form, in which the two morphemes are structur- ally ‘parallel’ and neither modifies, or is subordinate to, the other; or it can take a hierarchical form, with one constituent modified by and therefore structurally ‘dominating’ the other.

Packard 2000 page 22f

Wu-Urbanek commented 2 years ago

sure :)

LinguList commented 2 years ago

So since you already know more or less what expects us here, please provide a first classification with examples, I'll then double-check, and we then apply this (I'll double check all of your annotations).

LinguList commented 2 years ago

But for starters, you could pull out all types and tokens here for full words, such that we know: how many N N words are there, how many A N, etc. (shǔi guǒ is of course N N!), etc. Having seen these, we would double check all cases and add a syntax column that shows the relations.

Wu-Urbanek commented 2 years ago
So a table like: Type Concepts
N N 水果, Concept 2, Concept 3
A N Concept 1, Concept 2, Concept 3
LinguList commented 2 years ago

For example, yes.

Wu-Urbanek commented 2 years ago

It is a large table, but I think we are able to see some patterns here for setting up an annotation scheme

Compound type Chinese
V 亡刀,抵,推,捆,朵,腐,巴,躺,掼,樂,逃,游,吃,呷,摔,縛,說,話,刺,汧,站,喝,剉,拈,把,落,啜,聽,浮,𢬮,连,叨,搦,宰,哜,跌,逿,驚,繗,嗍,唬,𢱋,唚,綁,揩,摣,跑,𢱤,縫,無,㖹,漂浮,講,聞,特,眠,烧,互,楠,摀,诧,䋎,給,坐,攞,仍,擦,死,斫,摕,甩,戳,唱,淌,立,點,斬,扎,拍,刨,拉,頓,榨,玩,掠,乃豈,歹,食,數,壓,?,吸,嚙,打,刷,流,囗,遁,杀,鼻,丟,掏,嘔,逮,瓜,掰,畏,抾,洗,共,拭,跪,转,啄,搲,唠,扯,抹,撕,吹,漂,剁,嬉,撂,揎,㰵,㧍,拖,刣,見,倒,撥,㧸,行,呼,吐,抓,笑,啡,𪘬,居,吮,來,睡,撖,掀,拋,擔,想,徛,氽,走,挑,碰,裹,喫,咬,捉,爛,飲,砍,攪,挖,手點,怕,飛,啉,翻,耍,掉,𫦉,挨,擺,嗅,嫽,拿,挤,睏,乞
V V 腐爛,瞅見,嬉戏,睇見,漂浮,看見,扯破,𨑨迌,会晓,嘔吐,聽到,會仈,睡倒,活絡,撕開,知得,擺𠲥,拆扌丽,飲腐,撕裂,聽着,拿賜,角逆,看倒,聽見,知道,走來,游泳,縫纫,看着,洗澡,帶囗,仈傳,看到,浮倒,扯裂,囗得,晓得,扯開,呼吸,知影,㽹惡
V S 活葛,活其,捆起,活着,爛了,睡起,立着,吐了,爛起,懂了,嘔了,嘔嘞,打起,朿見着
V V S 發吐了
N N 星宿,羽毛,膝蓋,女官,堂客,耳朵,喙齒,眼窝,女人,腹肚,目珠,脂肪,蟲蟻,晚上,板油,树皮,腳板,眼睛,丈夫,頸骨,塵土,頭頸,毛毛,坌塵,天頂,翼梭,娭,膝盖,塗粉,頸項,翅𦑜,凌冰,心臟,脊背,右面,下昏,蒙纱,肚腸,右片,天空,天里,厝里,涂地,月娘,翼毛,嘴舌,云彩,頷頸,土地,家叉+蟲,星星,阿姆,左面,年份,背脊,女仂,下晚黑,女界,口舌,媳婦,肉皮,乳房,迷露,男界,鼻公,骹骨,黃昏,腦壳,夜下,翼膀,母親,右边,小边,目睭,棍棍,頸根,扣頸,天上,右手,腸子,虱嫲,蛇哥,种籽,眼,树林,娘妳,男仂,雞蛋,完心,婆娘,男客,翅膀,父親,姆媽,翼胛,男人,肚皮,眼珠,女客,名字,济面,地皮,奶奶,翅光,郎罷,脚爪,頭毛,油脂,夜裡,皮膚,叶?,月光,伊姐,女婿,奶姑,繩索,霧氣,虱母,左边,灰塵,頭壳,雞卵,鼻孔,后晌,胛脊,右岸,牙齒,骹腿,脰骨,左手,夜晚,頭髮,奶房,鼻哥,奶婆,葉光,森林,滃塵
N 江,日,肝,翅,尾,水,三,腸,膝,根,風,歲,蜀,狗,骹,爪,墿,二,雀,叶,皮,湖,背,一,奶,煙,鱼,兩,爷,棒,达,树,路,霜,娃,骨,膀,心,姐,牙,灰,頸,爺,地,毛,塵,河,花,頭,竹若,嘴,幔,數,父,?,肉,耳,囗,霧,鼻,冰,天,雪,火,年,眼,鳥,孩,角,舌,雨,喙,种,云,山,膏,草,索,海,蛇,棍,爹,匝,齒,团,某,肚,道,腿,血,媽,腳,犬,星,四,人,蟲,莖,五,盐,月,手,沙,𩪘,娘,繩,油,爸,載,虱
N N2 竹若竹若,奶奶,煙煙,子子,角角,灰灰,娒娒,爸爸,娃娃,星星,雀雀,媽媽,崽崽,尾尾,爺爺,爪爪,骨骨,索索,沙沙,腳腳,渣渣,妈妈,皮皮,爹爹
N S 林子,牙子,牙巴,舌仔,天中,盐巴,腸?,棍子,棍儿,孩儿,石頭,土垃,骨都,翅子,腸仔,雀子,沙子,花?,繩子,娘們,兜子,奶子,鳥唧,皮子,舌头,歲頭,鼻,星?,腸子,肚子,犬囝,膝頭,爪子,嘴巴,鼻仔,奶仔,叶子,囝仔,鼻子,沙儿,尾巴,索子,雞子,种?,妻子,娃儿,角,膝囗,舌頭,鳥,树儿,煙子,果們,果子,日頭,棒儿,花儿,脖子,尾子,耳仔,鳥儿,鼻頭,腹老,捶仔,索仔,渣子,花仔,汉字,翅仔,虱子,种子,舌嫲,肚皮,耳公,骨頭,孩子,舌子,雀儿,頸子,星子,罩子
V P 漂𡅏,浮𡅏
A 賁,厚,拐,潮,脏,紅,滿,嫩,呆,寒,綠,大,冷,長,⽧臺,重,彪,干,囗,多,烏,恘,操,闊,老,肿,歁,鄙,黃,濫,壞,瓜,寡,白,凍,浮,痞,進,小,平,對,焦,少,薄,爆,黑,直,熱,寬,凊,笨,愚,圆,狭,尖,短,細,窄,湿,毛,木,雛,否,远,新,好
A N 老妪,圆圈,咸盐,长虫,该搭,眾牲,老蛇,这疙,水果,細儿,该里,老媽,老爸,只里,老汉,小嘎,老姆,倒片,这块,老公,暗晡,暝晡,老娘,太陽,竹固裏,老安,老婆,小孩,正边,尔搭,壯肉,这?,即带,咯裏,先生,全部,小人,这里,大边
A N S 細人儿,老娘們,老爺們,老爷们,老倌子,細伢子
N A N 爺老倌,梭老二,天老爺,娃大妈,眼烏珠,家主婆
N N N 屋裡人,背骶身,五爪龙,女人家,男人家,左边拉,丈夫人,夜裡向,簸棱盖,膝蓋骨,額膝骨,波羅蓋,丈夫儂,左手边,克膝蓋,波棱蓋,伲囝哥,眼灵珠,右手边
N G 男的,女的
A V N 內當家
N N2 S 婆婆子
A A 蒌馊,鏖糟,拉渣,活囗,囗赖,歯屋歯足,邋遢,窄狭,瓜肿,厚實,埋汰,個郎,流疡,老小,腌臢,汚糟,龌龊
A A V 個郎下
V N S 打交儿,構棱子,洗身儿,泅河儿
N N S 諸娘儂,褲膝頭,腳膝頭,虱婆子,左边儿,手把子,腳饅頭,沙婆子,腿把子,肥肉子,克膝头,腳胐頭,夜間子,骹腹頭,腿肚子,脖颈子,翼胛子,树林子,囗囗子
Adv 伓,勿,唔,否,不
C N 如果,若果
V V N 打比说
C C 假使
C Pron 要是,若是,假系
C V 若还,倘若
V N 打仗,构凌,拍獵,干仗,跳虱,爱人,下凜,關山,倒落,動物,游水,打獵,透氣,玩水,結冰,出氣,浮水,𢻕氣,吸氣,凍冰,喘氣,打捶,泅水,游泳,打錘,洗澡,得腦,扯构,打架,無綻,呼氣,囗水
A V 透流,呢到
Pron Pron 狃宕,俚竹+马,底隹又,什麼,哪疙,我俺,邊度
Pron 竹固,俚,啥,𪜶,我,即,㑑,歸,他,囗,只,𠹼,那,者,回,許,渠,𠰻,你,佳,哪,呢,俺,佢,迄,怎,誰,伊,如,该,倷,这,尔,咯,乜,吾
Pron Pt 什哩,何呢,甚仂
Pron S 啥子,渠俫,他㑲,麽子,佢哋,哪嘞,我倈,哪?,吾伲,我們,那,我哋,他們,麼子,渠們,如儂
Pron CLS 脉個,哪個,啷個,咋个,吗个
Pron N 瞞人,什物,那塊,底儂,底所,那里,哪搭,啊搭,那,迄帶,夫塊,哪塊,乜嘢,旁單,我人,彎面,啥人,什乇,伊儂,侬家,渠人,哪里
N S S S 石頭子儿
N S N 石頭塊,膝頭骨,妹?人,日頭爺,男?人,男子客,腳頭足夫,日頭窠,外头人
Pron Pt N 何呢人
Pron P CLS 何至個
V Adv 撕烂
V N N 彈四郎
Pron G 怎的,他的
Pron S N 佢丁人,𠊎丁人
Pron G CLS 他的個
Pron C Pron 伊各儂,奴各侬
V V V 躺倒起
N A 頭那,月明,月亮,月光,下黑
N Pron N 娃他爸,頭那髮
Adv N 将样,白相,点样,怎样,将其,酿般
N V 骹遛,賭打
A Pt 黑咧
A S 進兜,生嘅,黑了,老子
N S S 腳丫子,囗囗囗,尾巴
Pron S S 我竹固哩
Pron S N CLS 我囗囗囗
Pron N C Pron 侬家各侬
N S N N2 树子叶叶
P V 因为
P Pt 庸乎
V A N 淴冷浴
N N Pt 右边拉
N N N2 右边边
N A S S 月亮巴巴
P 着,在,宿
Adv S 怎么
Adv Pron 些那,怎底,如何
Adv Adv2 纳亨,采翘
Adv V 和是,相拍
V N G 掌櫃的
A CLS 该个,囗个
A Pt N N 这个地方
A N N N 这嗒里下
V N V 打相打
N N V 四骹爬
Pref N 阿奶,依爹,依爸,依奶,阿爸,依爺,阿姆,依媽
A N N2 小娃娃
A N N 老娘客,細人家
N A S 娘老子,伢細子
A CLS N 細個崽
A CLS S 細個子
V A 帶勁,不賴,不糙
C 搭,同,和,跟,伉,共
V CLS 生個,健個,活個
V G 活的
V CLS Pt 活個囗
Pron V N 哪跟前
Pron V 何至
N Pron 里囗
Pron CLS N N 那個地方
CLS V 個到
A A N S 大老爺們
Pref N N 查某人
LinguList commented 2 years ago

What do you think are those cases I meant, which should be sub-classified?

Wu-Urbanek commented 2 years ago

you mean what types of compound count in meaning limiting, oppositional, modifications, and case-effect?

LinguList commented 2 years ago

Well, NN compounds are at least two types, maybe three:

But I suspect there are more. Do you see why I distinguish these four as different types?

A tip: look at the "motivation", at the relation between the compounds (all NN according to your analysis) and the concept they denote.

Note: this is not necessarily covered in Packard, unfortunately. But it is the key for the answer we'll give to the reviewer.

Wu-Urbanek commented 2 years ago

If I use kratochvil's (1970) classification: "earth" is the coordinate compound (土 and 地) The others are subordinate compounds: "woman" is the meaning limiting (女-modifier 人) in Packard's book. But attribute-head in Kratochvil (1970). "tongue" is probably the attribute-head type (the same as 女人). 口 or 嘴- location, 舌-head/dominate "ear, eye, heart": head-modifier (my alternative analysis to Kratochvil's head-modifier) 耳-head 朵-modifier, 眼-head 窩-modifier, 心-head 臟-modifier. I am not sure about feather "羽毛", it should be head-modifier. But somehow I have doubt about this one.

Chen's (2013) classification: "earth" is the copulative compound (土 and 地). Also the coordinate compound above "woman" : endocentric "eye, ear, feather, heart": complement (補充型: head-modifier) "tongue": endocentric (probably)

LinguList commented 2 years ago

Yep, coordinative or copulative compounds are one special class that is specific for Chinese, and these need to be identified and marked as such.

The subordinate compounds are tricky, and this is what we need to tell the reviewer, since one cannot say which part is "salient".

Coordinative compounds can be thought of developing in alternation with simplex forms, so we have tǔ + dì, but then they resolve evolutionary to dì or tǔ, both possible, but likewise, we can have both tǔ and tǔdì in one language, and speakers alternate between the forms, and at some point, one is lost. So we cannot say tǔ neither dì is ONLY salient, but BOTH should be.

The subordinate compounds are more difficult. The meaning-limiting modifier is bearing the main meaning here, and we can think of alternating forms like nǔ-rén vs. nü-shì, etc. and at one point in evolution, only one is left, so the salient part is probably nü, right?

In the case of tongue, we have a situation that is again quite difficult: the salient part is shé, of course, but we can again think of alternating forms in the same language, like kǒu shé or shé tou, so again it is not clear, what process this comes from, and what the original form was, or if we should say kǒu shé is === shé tou, due to shé being the most salient part here, or that we should pay attention to them.

In the case of yǔ máo it is even worse: máo typically also means "feather" in Chinese, so yǔ is specifying it, but yǔ is also "feather", so the salient part would again be yǔ.

I think this is the major argument that we need to make clear in the paper: just saying: I classify the compound synactically, with relations, is not enough, since the semantics is important. It is not just the "head" as the reviewer says. So by extending our examples in the text, we can defend the salient-part-notation.

Wu-Urbanek commented 2 years ago

I totally agree with the point that making it clear " it is not just identifying the head". 土地 both should be salient. 女 in 女人 is the salient part, then it is endocentric or head-attribute type.
As for the tongue, I will identify 舌 as salient in 口 and 頭. But cases like these might be tricky. So as you said we maybe need to discuss case by case when we run into this type. (reminds me of the "sweat" = "skin+sweat" in a Tupian language.)

My feeling is that salient parts although can not be defined in one sentence but morphemes' part of speech as well as the semantics can assist us to identify the salient morphemes. (my reflection on the analysis and also reading Packard's book Ch3.)

LinguList commented 2 years ago

Yes, the identification of the compound-relations, what I tried to do informally with the morpheme glosses, it helps to identify the salient parts. Of course, one would ideally provide a formal syntactic compound structure, but one would then also have to add the motivation, which is much less clear. We can add this to the draft, and also emphasize that we have annotated syntactic structures, as we discussed. Luckily, there are not too many cases with NN compounds, and you should check those cases which you annotated as AN, so that they are corrected in those cases, where they are actually NN, as I mentioned before. So we decide on a way to annotate compound structures, add those for all cases, and then add this passage explaining why it is not enough in the paper :)

Wu-Urbanek commented 2 years ago

The question is do I go through the dataset one more time to annotate all the different types of compounds? Or just review the "AN" compound if they are NN?

LinguList commented 2 years ago

I would ask you to first only do AN and review if they are NN. We then pull out the table again, and we decide.

Wu-Urbanek commented 2 years ago
Compound type Chinese
V 乃豈,擦,共,攪,縛,呼,揩,把,碰,扯,挖,亡刀,抾,摕,拿,叨,唱,喝,巴,氽,點,互,嘔,连,宰,㰵,眠,拈,抵,㧸,數,拋,說,食,講,捉,腐,來,榨,挤,畏,乞,吃,啉,戳,落,漂浮,頓,䋎,攞,抓,睏,睡,洗,掉,飛,壓,斫,掀,想,仍,嗅,唚,咬,吮,楠,繗,𢱤,拉,打,囗,瓜,摀,漂,撕,拖,逃,歹,擔,喫,呷,掏,刺,遁,𢬮,撂,裹,摣,㖹,給,逿,㧍,吹,流,手點,吐,搲,刣,斬,甩,啡,笑,捆,行,走,啜,驚,𫦉,拍,吸,挨,丟,翻,𢱋,𪘬,砍,朵,鼻,哜,刨,诧,耍,撖,拭,無,掰,死,游,徛,烧,浮,刷,怕,抹,?,聽,唠,逮,跑,剁,剉,樂,爛,揎,搦,啄,躺,話,飲,擺,站,淌,聞,摔,立,特,汧,转,推,掠,扎,掼,杀,撥,跪,縫,嬉,玩,跌,嫽,挑,倒,嚙,居,嗍,見,坐,綁,唬
V V 睇見,㽹惡,擺𠲥,看着,嘔吐,扯裂,扯開,拿賜,撕裂,扯破,活絡,仈傳,撕開,腐爛,看到,縫纫,洗澡,知得,知影,游泳,𨑨迌,看倒,知道,角逆,会晓,會仈,睡倒,浮倒,呼吸,聽見,走來,晓得,聽着,帶囗,聽到,瞅見,拆扌丽,看見,漂浮,飲腐,嬉戏,囗得
V S 爛了,活其,睡起,捆起,打起,懂了,爛起,嘔了,吐了,立着,活葛,嘔嘞,朿見着,活着
V V S 發吐了
N N 老娘,蒙纱,翅膀,頷頸,脚爪,水果,老媽,大边,羽毛,目睭,家叉+蟲,老汉,老妪,翼毛,牙齒,济面,男仂,脂肪,葉光,圆圈,地皮,雞卵,女婿,凌冰,丈夫,塗粉,塵土,父親,骹骨,男界,骹腿,頸根,种籽,右手,天里,下黑,天上,虱母,腸子,右片,霧氣,翅光,油脂,森林,右面,腹肚,男人,乳房,天空,虱嫲,姆媽,膝蓋,黃昏,左面,娘妳,土地,老婆,男客,女官,迷露,口舌,頸項,翅𦑜,滃塵,女客,棍棍,伊姐,暝晡,婆娘,板油,夜裡,后晌,奶房,心臟,左边,頭毛,腦壳,完心,女仂,晚上,眼珠,繩索,眼睛,娭,胛脊,月光,脊背,皮膚,夜下,星星,眼,太陽,雞蛋,年份,涂地,右岸,云彩,母親,媳婦,右边,蛇哥,頭髮,奶奶,叶?,肚皮,全部,肉皮,目珠,耳朵,阿姆,女界,下昏,天頂,鼻公,小边,蟲蟻,頸骨,厝里,背脊,毛毛,翼梭,正边,暗晡,老安,腳板,嘴舌,夜晚,頭壳,月娘,肚腸,灰塵,树林,堂客,郎罷,翼膀,女人,鼻哥,树皮,老小,名字,奶婆,翼胛,鼻孔,喙齒,坌塵,脰骨,老公,星宿,扣頸,眼窝,左手,奶姑,膝盖,頭頸
N 腳,膀,載,心,耳,霜,江,頸,二,姐,歲,腸,霧,肝,湖,舌,數,道,肉,蜀,人,喙,角,翅,孩,骹,尾,五,蟲,冰,兩,一,星,奶,莖,頭,日,云,膝,媽,种,匝,嘴,囗,油,树,幔,膏,娘,雀,父,圆,鳥,达,𩪘,雪,草,地,盐,索,竹若,雨,火,血,娃,犬,根,手,沙,塵,肚,毛,河,皮,眼,腿,某,海,鼻,水,棍,三,灰,爺,墿,鱼,花,?,四,爷,風,煙,繩,山,狗,骨,爸,蛇,天,齒,团,棒,虱,年,叶,爹,路,月,背,爪,牙
N N2 沙沙,雀雀,腳腳,爸爸,崽崽,竹若竹若,皮皮,妈妈,子子,爪爪,星星,灰灰,爹爹,媽媽,索索,煙煙,娃娃,奶奶,角角,骨骨,尾尾,娒娒,渣渣,爺爺
N S 黑了,牙巴,骨頭,肚子,耳仔,果們,犬囝,捶仔,鳥儿,牙子,繩子,翅子,星子,种子,渣子,腸?,尾巴,果子,沙子,鳥唧,腸仔,土垃,奶仔,耳公,棍儿,娃儿,膝囗,叶子,棒儿,舌嫲,日頭,雀儿,娘們,星?,脖子,鼻子,罩子,盐巴,爪子,汉字,老子,鼻,囝仔,骨都,尾子,孩儿,虱子,树儿,索仔,沙儿,花仔,雀子,妻子,奶子,膝頭,棍子,种?,舌仔,孩子,腹老,歲頭,肚皮,天中,林子,鳥,舌頭,頸子,舌子,鼻頭,鼻仔,嘴巴,索子,翅仔,角,花儿,腸子,石頭,兜子,花?,雞子,煙子,皮子,舌头
V P 漂𡅏,浮𡅏
A 笨,寬,壞,小,重,拐,焦,嫩,短,老,彪,對,寡,白,多,冷,新,爆,凍,呆,細,肿,毛,厚,直,黃,歁,黑,雛,干,否,木,進,囗,狭,湿,远,瓜,恘,凊,綠,薄,大,賁,滿,鄙,好,紅,愚,濫,長,浮,闊,烏,平,窄,⽧臺,痞,脏,寒,潮,少,熱,操,尖
N N S 老娘們,老爷们,克膝头,虱婆子,肥肉子,腳饅頭,夜間子,諸娘儂,腿肚子,树林子,囗囗子,腿把子,脖颈子,沙婆子,左边儿,腳膝頭,骹腹頭,腳胐頭,手把子,褲膝頭,翼胛子
N A N 家主婆,天老爺,眼烏珠,娃大妈
N N N 波棱蓋,背骶身,簸棱盖,丈夫人,屋裡人,男人家,右手边,左手边,五爪龙,額膝骨,爺老倌,波羅蓋,丈夫儂,夜裡向,梭老二,克膝蓋,左边拉,女人家,下晚黑,眼灵珠,膝蓋骨,伲囝哥
N G 女的,男的
N V N 內當家
N N2 S 婆婆子
A N 小人,咸盐,全部,老蛇,壯肉,細儿,小嘎,长虫,先生,小孩,眾牲
A A V 個郎下
V N S 洗身儿,構棱子,泅河儿,打交儿
Adv 唔,勿,不,否,伓
C N 如果,若果
V V N 打比说
C C 假使
C Pron 假系,若是,要是
C V 若还,倘若
V N 跳虱,結冰,爱人,無綻,浮水,倒片,打架,打獵,吸氣,出氣,透氣,打捶,玩水,扯构,打仗,倒落,𢻕氣,得腦,洗澡,呼氣,游泳,游水,喘氣,泅水,下凜,關山,凍冰,囗水,构凌,動物,打錘,拍獵,干仗
A V 透流
Pron Pron 邊度,哪疙,俚竹+马,底隹又,什麼,我俺,狃宕
Pron 𪜶,者,他,迄,乜,竹固,如,那,你,哪,俚,伊,怎,俺,佢,許,囗,渠,即,回,尔,𠹼,呢,倷,这,该,歸,吾,啥,咯,誰,我,㑑,𠰻,只,佳
Pron Pt 何呢,什哩,甚仂
Pron S 渠們,哪?,他㑲,哪嘞,那,我們,我哋,他們,如儂,麼子,渠俫,我倈,啥子,佢哋,麽子,吾伲
Pron CLS 这块,吗个,咋个,哪個,脉個,啷個
Pron N 什乇,这疙,哪里,彎面,该里,这里,只里,那,啥人,底所,渠人,啊搭,竹固裏,迄帶,旁單,该搭,哪搭,夫塊,尔搭,哪塊,那塊,伊儂,即带,我人,底儂,瞞人,什物,咯裏,乜嘢,侬家,那里,这?
N S S S 石頭子儿
N S N 腳頭足夫,日頭窠,妹?人,日頭爺,石頭塊,膝頭骨,外头人,男子客,男?人
Pron Pt N 何呢人
Pron P CLS 何至個
V Adv 撕烂
V N N 彈四郎
Pron G 他的,怎的
Pron S N 佢丁人,𠊎丁人
Pron G CLS 他的個
Pron C Pron 伊各儂,奴各侬
V V V 躺倒起
N A 頭那,月亮,月明,月光
N Pron N 娃他爸,頭那髮
Adv N 将样,点样,酿般,白相,怎样,将其
N V 賭打,骹遛
N Pt 黑咧
N S S 囗囗囗,腳丫子,尾巴
Pron S S 我竹固哩
Pron S N CLS 我囗囗囗
Pron N C Pron 侬家各侬
N S N N2 树子叶叶
P V 因为
P Pt 庸乎
V A N 淴冷浴
N N Pt 右边拉
N N N2 右边边
N A S S 月亮巴巴
P 宿,着,在
A A 汚糟,歯屋歯足,蒌馊,龌龊,厚實,活囗,個郎,瓜肿,囗赖,拉渣,窄狭,埋汰,鏖糟,流疡,腌臢,邋遢
Adv S 怎么
Adv Pron 些那,如何,怎底
Adv Adv2 采翘,纳亨
Adv V 和是,相拍
V N G 掌櫃的
A N S 細伢子,老爺們,老倌子,細人儿
A CLS 囗个,该个
Pron CLS N N 那個地方,这个地方
Pron N N N 这嗒里下
Pron V 何至,呢到
V N V 打相打
N N V 四骹爬
Pref N 依爹,依爺,依奶,老爸,老姆,依媽,阿姆,阿爸,依爸,阿奶
A N N2 小娃娃
A N N 細人家,老娘客
N A S 娘老子,伢細子
A CLS N 細個崽
A CLS S 細個子
V A 不賴,不糙,帶勁
C 共,搭,跟,同,和,伉
V CLS 生個,活個,健個
V G 活的
V CLS Pt 活個囗
A S 生嘅,進兜
Pron V N 哪跟前
N Pron 里囗
CLS V 個到
A A N S 大老爺們
Pref N N 查某人
LinguList commented 2 years ago

Okay, there are still cases of reduplication, etc., which are listed as NN, and I would rather say nán and nü are adjectives, but from this, we can start now to reflect on the way in which we proceed.

I see two possibilities:

  1. annotate all complex words further and define saliency classes from them
  2. pick explicit examples for coordinative compounds, cases of modifier -> modified, etc., and our saliency decision, which we can then present as a specific kind of analysis in the study, under a section "Examples".

For 2, I would suggest to start by pulling out all saliency decisions for the individual cognate sets, and to try to see if we have patterns (first element, second element, etc.) in the data. We need a good number of examples that also show how difficult it is to come up with straightforward decisions. E.g., having one modifier-modified where modifier and modified are salient in our interpretation, one where this is not the case, etc.

LinguList commented 2 years ago

If we go for possibility 2 (which is easier), we can mention that we have annotated the types of compounds for convenience (the Part of Speech of the constituents) and that we can classify the compounds, but that the decision is not always straightforward, especially if one does not know the languages in depth, and that we show examples for tough decisions. Then you could pull out some concepts as examples (e.g., the ones which we discussed here so far).

LinguList commented 2 years ago

Ah, @MacyL, 右 and zuǒ are for me also adjectives, not nouns. At least, they always modify, similar to nán and nu, they do not stand alone, and they are adjectives in many other languages.

Wu-Urbanek commented 2 years ago

I agree to go for this one. Fixing the "leftside", "rightside", "女人", and "男人" at this moment, and then I am going to inspect the trees today.

If we go for possibility 2 (which is easier), we can mention that we have annotated the types of compounds for convenience (the Part of Speech of the constituents) and that we can classify the compounds, but that the decision is not always straightforward, especially if one does not know the languages in depth, and that we show examples for tough decisions. Then you could pull out some concepts as examples (e.g., the ones which we discussed here so far).

LinguList commented 2 years ago

Okay, to follow up on this part, please also look into examples that we could use to emphasize that saliency is not the same as head-modifier-structure.

Wu-Urbanek commented 2 years ago

a short question: leftside: 倒片 (VN), 小边(NN) rightside:正边(NN),大边 (NN) should I change them back to AN?

LinguList commented 2 years ago

Yes, for me, these are all adjectival readings. Of course. Did you change all AN to NN?

LinguList commented 2 years ago

By the way: 倒 is also an adjectival reading, it is like a participle, "falling side". You must start from semantics always to identify the POS.

Wu-Urbanek commented 2 years ago

Not all AN to NN, 倒面 was always annotated as VN (from my side).

Yes, for me, these are all adjectival readings. Of course. Did you change all AN to NN?

LinguList commented 2 years ago

Okay :) I think that I annotate the POS differently than you due to my background in German and work on POS in Latin and many languages, while POS does not really apply well to Chinese.

LinguList commented 2 years ago

But do you think you can suggest some interesting examples, where the decision is not easy to make and our saliency is different from the simple "A modifies B" as the reviewer suggested?

Wu-Urbanek commented 2 years ago

Not really an interesting case, but definitely a confusing situation.
If we annotate 男 and 女 as A, then "husband" is a noun, 男的 will be "AG" which I am not sure it can be a noun compound, and 男人 is "AN". I usually check the cases on: https://www.zdic.net/

LinguList commented 2 years ago

"the falling" "der Laufende" are also nouns in German. The combination is a noun, the components are adjectives. It is like a participle, the de in Chinese.

Wu-Urbanek commented 2 years ago

I think interesting cases were like 老X, for example, wife 老婆. Yunfan and I had a quick discussion about this case, the 老 can be an adj. usually. But 老婆 does not mean an old lady/wife, the entire noun compound cannot be separated as 老 and 婆. I annotate this case as NN, but the first N is to attach to the second N. (like the N N2, but N2 in our analysis was to address the reduplication.)

Same as 太陽, 太 has the meaning of extreme or greatest, 陽 is bright or masculine. The two morphemes cannot be separated, so I also annotate them as NN. This is the same, the first N is attached to the second N.

The above cases, I also write "multisyllabic" words in the notes column.

LinguList commented 2 years ago

Thanks, THIS is the kind of example I want to have. This is that the whole meaning refers to the object, not one part, etc., and one cannot say "lao po" vs. "lao qizi" or similar, but one has to keep it together! Yes, please make a list of examples with their meanings, so we can add them to the draft and discuss them!