proycon / folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
http://proycon.github.io/folia/
GNU General Public License v3.0
60 stars 10 forks source link

add a new property to detect tags that may be (or MUST be) used as Wref's #63

Closed kosloot closed 5 years ago

kosloot commented 5 years ago

In the code for folia.py and libfolia, there are several places where we have to check if a Word, Phoneme or Morpheme is to be handled special, e.g. as a Wref.

Examples: (folia.py)

            if isinstance(child, (Word, Morpheme, Phoneme)):
                #Include REFERENCES to word items instead of word items themselves
                attribs['{' + NSFOLIA + '}id'] = child.id
            if isinstance(c,Word) or isinstance(c,Morpheme) or isinstance(c, Phoneme):
                targets.append(c)
                if type is layerclass:
                    for e2 in layer.select(AbstractSpanAnnotation,set,True, (True, Word, Morpheme)):
                        if not isinstance(e2, AbstractSpanRole) and self in e2.wrefs():
                            yield e2

(in the latter case I even wonder if Phoneme is missing there?)

Examples from folia_impl.cxx

    for ( const auto& el : data ) {
      if ( ( el->element_id() == Word_t ||
         el->element_id() == Phoneme_t ||
         el->element_id() == Morpheme_t )
       && el->refcount() > 0 ){
    xmlNode *t = XmlNewNode( foliaNs(), "wref" );
    if ( c->parent() &&
     !( c->element_id() == WordReference_t
        || c->element_id() == Word_t
        || c->element_id() == Morpheme_t
        || c->element_id() == Phoneme_t ) ) {

etc.

Maybe it is a good idea to create a property to select those special cases, where Word, Morpheme and Phoneme have a property REFERABLE, or such. It makes the code clearer, faster and more robust.

proycon commented 5 years ago

That's a good idea indeed, I'll think of something.

proycon commented 5 years ago

I added a property that will be called WREFABLE in the C++ code.