lapps / vocabulary-pages

DSL files and templates used to generate the LAPPS WS-EV pages.
Apache License 2.0
0 stars 0 forks source link

What do non-LIF services produce? #57

Open marcverhagen opened 7 years ago

marcverhagen commented 7 years ago

We have always considered the metadata produces field to tell us what kind of vocabulary items are generated by a service. So we say vocab.lappsgrid.org/Token assuming that in the LIF structure we have annotations like

{ id: tok01, @type: vocab.lappsgrid.org/Token, start: 1, end: 12, ... }

But we say this even for services that do not generate LIF, for example the GATE Named Entity Recognizer. This has recently become more problematic because GATE NER produces Person and Organization and the like, and the vocab does not have those anymore. We could have getMetadata() return NamedEntity instead.

In any case, we may have to think this through a bit more and we have to be at least clear about what produces means. In a way, if GATE NER produces NamedEntity annotations then what we really mean is that it produces something (GATE format with Person and Organization types) and that something can be translated into LIF with NamedEntity annotations.

This is also an issue for WebLicht services since those do not produce LIF either.

keighrim commented 7 years ago

Correct me if I'm thinking to naive, but I think it's okay to return "NamedEntity" as produces metadata as long as we specify that the output format (discriminator) is something else than LIF. Even though GATE tool returns ORG, PER, ... they are all fall into the category of http://vocab.lappsgrid.org/NamedEntity.html at least conceptually. Isn't this what the lapps WSEV is about?

marcverhagen commented 7 years ago

No correction from me since I tend to agree with that. But I do want us to think about this a little bit. At the least this needs to be explained well somewhere in the LIF specifications or some other relevant spot.

ksuderman commented 7 years ago

I think in a simple cases like the Gate NER or Weblicht tokenizer that simply using URLs from the WSEV is the best we can do, even though technically that is not what the services produce. However, I think we are going to have to give more thought to what the produces means and how it is structured. For example services that create multiple annotations types vs services that create one annotation type from a set of types.