own-pt / own-en-legacy

The openWordnet-EN, a converted and expanded PWN
MIT License
0 stars 1 forks source link

syntactic markers #43

Open odanoburu opened 5 years ago

odanoburu commented 5 years ago

An adjective may be annotated with a syntactic marker indicating a limitation on the syntactic position the adjective may have in relation to noun that it modifies. If so marked, the marker appears between the word and its following comma. If a lex_id is specified, the marker immediately follows it. The syntactic markers are: (p) predicate position (a) prenominal (attributive) position (ip) immediately postnominal position

(from https://wordnet.princeton.edu/documentation/wninput5wn)

how to represent them in the text format?

I think they are similar to frames, so we could encode them as such..? (1.) or should we include them as another ad hoc thing, like frames, but with its own name? (2.) or should we just put this information in a separate file? (3.) (I'm thinking we might want to have a few of those anyway, so this information could be shown in the emacs mode and even be editable there)

1.

w: abounding
w: galore 1 frame 3     # with 3 meaning (ip)
sim: adj.all:abundant
g: existing in abundance; "abounding confidence"; "whiskey galore"

2.

w: abounding
w: galore 1  marker ip
sim: adj.all:abundant
g: existing in abundance; "abounding confidence"; "whiskey galore"

3.

adjs.all:galore:1   ip
arademaker commented 5 years ago

I tend to prefer 2

odanoburu commented 5 years ago

that's my least favorite option from the implementation point of view, since it introduces more ad hoc things. in the wordsense and synset datatypes we've defined we already have fields for frames, while all realations are lumped together in one field. plus we only have markers in a few adjectives, but all wordsenses would end up having this field -- unless we can think of a better representation. i don't like the idea of treating adjectives specially, but maybe that's one way to go..

data WNWord = WNWord WordSenseIdentifier [FrameIdentifier] [WordPointer]
  deriving (Show,Eq)

-- synsets can be 
data Unvalidated
data Validated

data Synset a = Synset
  { sourcePosition       :: SourcePosition
  , lexicographerFileId  :: LexicographerFileId
  , wordSenses           :: NonEmpty WNWord
  , definition           :: Text
  , examples             :: [Text]
  , frames               :: [Int]
  , relations            :: NonEmpty SynsetRelation
  } deriving (Show,Eq)