opensemanticsearch / solr-ontology-tagger

Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri
https://opensemanticsearch.org/solr-ontology-tagger
GNU General Public License v3.0
46 stars 11 forks source link

Applying uploaded RDF ontologies not working #12

Open bhelou opened 6 years ago

bhelou commented 6 years ago

Hi,

When we upload an RDF ontology (say ontology.xml) to OSS, we have the option of applying it to existing documents. This isn't currently working as expected because in solr_ontology_tagger.py the fields that are updated are only test_xml_ss_preferred_label_ss and test_xml_ss_uri_ss. The field test_xml_ss (which gets turned to a facet) is not updated. To fix this, I've changed

tagdata = add_value_to_facet(facet = target_facet + '_preferred_label_ss', value = preferred_label, data=tagdata)

to

tagdata[target_facet] = preferred_label
tagdata = add_value_to_facet(facet = target_facet + '_preferred_label_ss', value = preferred_label, data=tagdata)

Is it a reasonable fix?

Thanks! Bassam

PS: As I'm sure you know, there is a difference in how entity_linker tags documents and how solr_ontology_tagger tags documents. (I think) Entity_linker only tags a document if the facet search returns exact matches to a label:

if str(value).lower() == queries[query]['query'].lower():
              match = True

While playing around with OSS, this seems to cause problems when for example a dictionary term is 'hello '. 'hello' will be returned in a facet search but entity_linker won't tag the document because it's not the same as 'hello ' (with the space at the end).