own-pt / openWordnet-PT

OpenWordnet-PT: an open access wordnet for Portuguese
http://openwordnet-pt.org
Other
153 stars 35 forks source link

adjective phrases #182

Open FredsoNerd opened 3 years ago

FredsoNerd commented 3 years ago

In OWN-PT, we found some occurrences of adjectives with the form "de foo". For instances: de ouro, de bronze, de folha, de madeira, de lã. More examples: de Dipylon, de Galloway, de Gilbert. It's very common to use those constructions preposition+substantive as adjectives, they're so called ADJECTIVE LOCUTIONS.

The question here is: should those appear as Words, or should they be considered syntactic constructions?

arademaker commented 3 years ago

https://www.normaculta.com.br/locucao/ defines locução as "Uma locução é o conjunto de duas ou mais palavras que transmitem um único significado, desempenhando uma única função gramatical."

Despite the highlights, a construction like de madeira is 'regular' (I mean, not special in any sense) in Portuguese (N de N). In UD syntactic theory the analysis would be like:

image

For this reason, I don't see any value in adding adjective locutions in OWN-PT. Comments are welcome! @leoalenc?

vcvpaiva commented 2 years ago

Well, the point here is that in English adjectives like "wooden", "golden", "silky", "sandy" are fairly common. Especially the ones using the suffix "y" are not well-documented in PWN. In Portuguese we have "dourado", but for the others we have to say "de madeira", "de seda"--or "como seda", "de areia". Even for "golden" we made the distinction: "dourado" vs. "de ouro". So I'm agreeing with @arademaker that we don't need special treatment for them in PT, but we do in English.

arademaker commented 2 years ago

As we wrote in our paper https://aclanthology.org/2019.gwc-1.48/:

Semantically tagging (or sense annotating) a corpus is a task of constructing a semantic concordance – a textual corpus and a lexicon so combined that every content word in the text is linked to its appropriate sense in the lexicon (Miller et al., 1993).

It seems to me that Miller never considers WN in isolation. The idea of a semantic concordance is central in this paper from 1993. The work on the gloss tag corpus is an opportunity to test this approach. Take the wooden example:

(v) sand, sandpaper | rub with sandpaper ; sandpaper the wooden surface ;

wooden here would be translated to de madeira as suggested in http://wn.mybluemix.net/synset?id=01386433-v

So potentially, we will have the mapping from wooden to de madeira via the syntax. But I don't think we should add de madeira in http://wn.mybluemix.net/synset?id=02576489-a.

vcvpaiva commented 2 years ago

so your suggestion is to leave the synset "empty"? I do not agree.

arademaker commented 2 years ago

explicitly marked as a concept not lexicalized in Portuguese, yes.

vcvpaiva commented 2 years ago

This doesn't make sense to me (copying @Leonel Figueiredo de Alencar @.***> as well). OWN-PT like PWN has its use as a dictionary and a bilingual one at that. To check "wooden" and be told that there is no corresponding term in Portuguese is not sensible, as far as I'm concerned.

On Tue, Sep 21, 2021 at 3:57 PM Alexandre Rademaker < @.***> wrote:

explicitly marked as a concept not lexicalized in Portuguese, yes.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/own-pt/openWordnet-PT/issues/182#issuecomment-924447910, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIZ3HZV7WCYG2GM6FPLBT3UDEEUVANCNFSM47472XGA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.