If only I can build ontologies (knowledge bases) from tags ...

Baytars commented 1 year ago

Is your feature request related to a problem? Please describe. The "knowledge base" is a very useful feature in creating ontologies. However, it may initially be designed to work with external built ontologies, so I find it difficult to build ontology from tags, a way I expected this tool can work.

Describe the solution you'd like

The annotator creates several tags while doing the annotation job.
A knowledge base is created to house all the tags created, ...
of which the manager can move the hierarchy and organize the ontology tree.
The tags are classes, while the annotated text spans are instances respectively.
Some annotated text spans may point to the same instance identifier, so some of them should be moved to statements to become aliases instead. This way, duplicate instances are removed.
The knowledge base manager can update URIs of the instances, so when instances are concept linked, the URIs point to existing web pages that they can visit. -> https://github.com/inception-project/inception/issues/3867
In the annotation job, annotated texts are updated as well, with the "identifier" features point to the respective instances. The recommender learns about such concept linking and does concept linking for us, and the annotators see to check the correctness. (Note that, strangely, in current version of INCEpTION, 26.5, the input box of concept shrinks to nearly zero width. Clicking the input box and entering a search string is quite difficult. I can't see what characters I have input. I used both Firefox and Chromium-based Edge browser and they have the same appearance. Please fix it! ==> https://github.com/inception-project/inception/issues/3708)

Describe alternatives you've considered I may manually build the ontology knowledge base, but concept linking can be so daunting if you have millions of texts to link that you finally decide to give up.

Additional context The implementation may be quite complex, but will be worth it, considering the great amount of work of concept linking saved.

reckart commented 1 year ago

You can use internal and external knowledge bases, but as you put it, there are lots of ways to improve this functionality.

I have opened a separate issue for the problem with the feature editor shrinking to nothingness: https://github.com/inception-project/inception/issues/3708

reckart commented 1 year ago

The knowledge base manager can update URIs of the instances, so when instances are concept linked, the URIs point to existing web pages that they can visit.

In most (externally) created KBs, the concept IRI is simultaneously an URI resolvable by the browser (e.g. wikidata). For concepts created within INCEpTION, this is not the case though. Concept linking works by storing the concept IRI in the annotation. If you would change a concept IRI in the KB, then this would destroy the link of the annotation to the KB concept. It could be considered though to introduce a particular concept property that may point to an external resource further describing the concept.

Baytars commented 1 year ago

For concepts created within INCEpTION, this is not the case though. Concept linking works by storing the concept IRI in the annotation. If you would change a concept IRI in the KB, then this would destroy the link of the annotation to the KB concept.

Maybe similar to code refactoring, changing the IRI accordingly can be achieved. Perhaps internally INCEpTION uses a hidden and constant concept IRI that it auto generates and only it can recognize, and has an external and apparent IRI that is to be displayed and edited.

reckart commented 1 year ago

Maybe similar to code refactoring, changing the IRI accordingly can be achieved.

In theory, yes, but it requires a significant number of updates and is a disruptive operation in a multi-user environment.

reckart commented 1 year ago

Some annotated text spans may point to the same instance identifier, so some of them should be moved to statements to become aliases instead. This way, duplicate instances are removed.

What do you mean by "some of them should be moved to statements to become aliases instead"? From my perspective, it should be perfectly fine if multiple spans are linked to the same concept IRI - it means that they all refer to the same concept. What is the "statement" that you want to introduce here?

Baytars commented 1 year ago

What is the "statement" that you want to introduce here?

It is INCEpTION that introduced "statement". In KB, in Class and Instance panel, there are buttons with the text "+ New statement". That's the "statement" I'm referring to. In fact, precisely, it is "annotation property" from the perspective of OWL. See the post Synonyms, IRIs, and Labels in OWL and SKOS by Michael DeBellis.

From my perspective, it should be perfectly fine if multiple spans are linked to the same concept IRI - it means that they all refer to the same concept.

I agree, but imagine the scenario that you want to export the KB to an OWL file. You want the concept to have the annotation property "synonyms". Perhaps they will be automatically gathered from these spans, which is what I want to achieve, no matter what the implementation of this idea will be. I don't want these spans to be mapped to multiple concepts (with multiple IRIs) and treated as distinct classes/instances. These spans have to be merged to a single class/instance with the annotation property "synonyms" (see the post I mentioned above) containing different expressions of the concept, and a single IRI, so that when you visualize the concepts, you don't get multiple same concepts occupying multiple spots. What's worse, if they have relations to other concepts, too many arrows will be drawn to mess up and do a disservice to the visualization. I just don't want that to happen.

reckart commented 1 year ago

wrt "statement": aaaah... actually, this statement concept comes from RDF (https://www.w3.org/TR/rdf-concepts/#section-data-model) - although the INCEpTION KB statement is a bit more complex and could actually consist of multiple triples such as they might be necessary for "reified statements".

In INCEpTION, you usually configure one property to be the "label property" - this value of this property is shown to the user whenever a concept is used instead of the concept's IRI.

There is the option to define "Additional Matching Properties" which would contain synonyms. E.g. the primary label property could be http://www.w3.org/2004/02/skos/core#prefLabel and http://www.w3.org/2000/01/rdf-schema#label could be added to the "Additional Matching Properties".

inception-project / inception

If only I can build ontologies (knowledge bases) from tags ... #3704