thorunna / UDConverter

A treebank format converter for converting PPCHE-style treebanks into UD treebanks.
Apache License 2.0
4 stars 0 forks source link

rules.py - Laga punctuation vensl #12

Open hinrikur opened 4 years ago

hinrikur commented 4 years ago

Sjá reglur frá UD:

Punctuation

Tokens with the relation punct always attach to content words (except in cases of ellipsis) and can never have dependents. Since punct is not a normal dependency relation, the usual criteria for determining the head word do not apply. Instead, we use the following principles:

  1. A punctuation mark separating coordinated units is attached to the immediately following conjunct.
  2. A punctuation mark preceding or following a dependent unit is attached to that unit.
  3. Within the relevant unit, a punctuation mark is attached at the highest possible node that preserves projectivity.
  4. Paired punctuation marks (quotes and brackets) should be attached to the same word unless that would create non-projectivity. This word is usually the head of the phrase enclosed in the paired punctuation.

Þetta er ekki svona núna

hinrikur commented 4 years ago

Falli bætt við sem lagar greinarmekri í upptalningum með því að tékka hvort næsta orð hafi conj venslamerkingu.

Hefur (að öllum líkindum) ekki áhrif punkta 2., 3., og 4. að ofan.