inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
596 stars 152 forks source link

If only I can build ontologies (knowledge bases) from tags ... #3704

Open Baytars opened 1 year ago

Baytars commented 1 year ago

Is your feature request related to a problem? Please describe. The "knowledge base" is a very useful feature in creating ontologies. However, it may initially be designed to work with external built ontologies, so I find it difficult to build ontology from tags, a way I expected this tool can work.

Describe the solution you'd like

Describe alternatives you've considered I may manually build the ontology knowledge base, but concept linking can be so daunting if you have millions of texts to link that you finally decide to give up.

Additional context The implementation may be quite complex, but will be worth it, considering the great amount of work of concept linking saved.

reckart commented 1 year ago

You can use internal and external knowledge bases, but as you put it, there are lots of ways to improve this functionality.

I have opened a separate issue for the problem with the feature editor shrinking to nothingness: https://github.com/inception-project/inception/issues/3708

reckart commented 1 year ago

The knowledge base manager can update URIs of the instances, so when instances are concept linked, the URIs point to existing web pages that they can visit.

In most (externally) created KBs, the concept IRI is simultaneously an URI resolvable by the browser (e.g. wikidata). For concepts created within INCEpTION, this is not the case though. Concept linking works by storing the concept IRI in the annotation. If you would change a concept IRI in the KB, then this would destroy the link of the annotation to the KB concept. It could be considered though to introduce a particular concept property that may point to an external resource further describing the concept.

Baytars commented 1 year ago

For concepts created within INCEpTION, this is not the case though. Concept linking works by storing the concept IRI in the annotation. If you would change a concept IRI in the KB, then this would destroy the link of the annotation to the KB concept.

Maybe similar to code refactoring, changing the IRI accordingly can be achieved. Perhaps internally INCEpTION uses a hidden and constant concept IRI that it auto generates and only it can recognize, and has an external and apparent IRI that is to be displayed and edited.

reckart commented 1 year ago

Maybe similar to code refactoring, changing the IRI accordingly can be achieved.

In theory, yes, but it requires a significant number of updates and is a disruptive operation in a multi-user environment.

reckart commented 1 year ago

Some annotated text spans may point to the same instance identifier, so some of them should be moved to statements to become aliases instead. This way, duplicate instances are removed.

What do you mean by "some of them should be moved to statements to become aliases instead"? From my perspective, it should be perfectly fine if multiple spans are linked to the same concept IRI - it means that they all refer to the same concept. What is the "statement" that you want to introduce here?

Baytars commented 1 year ago

What is the "statement" that you want to introduce here?

It is INCEpTION that introduced "statement". In KB, in Class and Instance panel, there are buttons with the text "+ New statement". That's the "statement" I'm referring to. In fact, precisely, it is "annotation property" from the perspective of OWL. See the post Synonyms, IRIs, and Labels in OWL and SKOS by Michael DeBellis.

From my perspective, it should be perfectly fine if multiple spans are linked to the same concept IRI - it means that they all refer to the same concept.

I agree, but imagine the scenario that you want to export the KB to an OWL file. You want the concept to have the annotation property "synonyms". Perhaps they will be automatically gathered from these spans, which is what I want to achieve, no matter what the implementation of this idea will be. I don't want these spans to be mapped to multiple concepts (with multiple IRIs) and treated as distinct classes/instances. These spans have to be merged to a single class/instance with the annotation property "synonyms" (see the post I mentioned above) containing different expressions of the concept, and a single IRI, so that when you visualize the concepts, you don't get multiple same concepts occupying multiple spots. What's worse, if they have relations to other concepts, too many arrows will be drawn to mess up and do a disservice to the visualization. I just don't want that to happen.

reckart commented 1 year ago

wrt "statement": aaaah... actually, this statement concept comes from RDF (https://www.w3.org/TR/rdf-concepts/#section-data-model) - although the INCEpTION KB statement is a bit more complex and could actually consist of multiple triples such as they might be necessary for "reified statements".

In INCEpTION, you usually configure one property to be the "label property" - this value of this property is shown to the user whenever a concept is used instead of the concept's IRI.

There is the option to define "Additional Matching Properties" which would contain synonyms. E.g. the primary label property could be http://www.w3.org/2004/02/skos/core#prefLabel and http://www.w3.org/2000/01/rdf-schema#label could be added to the "Additional Matching Properties".