inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
588 stars 149 forks source link

No results when from external ElasticSearch repository #1124

Closed fer-git closed 5 years ago

fer-git commented 5 years ago

Describe the bug I would like to use ElasticSearch as Document Repositories. I enable it through setting but when I query for some words on the search page, I cannot get any result.

Currently, I have both Inception version 0.8.3 and ElasticSearch version 6.7.0 on my local computer.

My Document Repositories setup: Remote URL: http://localhost:9200 Index Name: finance Search Path: _search Object Type: para

I am able to query the documents through python api,

from elasticsearch import Elasticsearch

es = Elasticsearch(["localhost:9200"])
res = es.search(index="finance", body={"query": {"match_all": {}}})
for hit in res["hits"]["hits"][:4]:
    print(hit)

I got:

{'_index': 'finance', '_type': 'para', '_id': '0', '_score': 1.0, '_source': {'text': 'Major snowstorm lashes Great Plains, heads east'}}
{'_index': 'finance', '_type': 'para', '_id': '14', '_score': 1.0, '_source': {'text': '"People are getting stuck in the middle of the roadway, it\'s just that deep," Kellerman said.'}}
{'_index': 'finance', '_type': 'para', '_id': '19', '_score': 1.0, '_source': {'text': "Drought-stricken farmers in the Great Plains, one of the world's largest wheat-growing areas, welcomed the moisture brought by the storm, although experts said more rain or snow would be needed to ensure healthy crops."}}
{'_index': 'finance', '_type': 'para', '_id': '22', '_score': 1.0, '_source': {'text': 'Shares in PT Visi Media Asia Tbk, controlled by the politically connected Bakrie family, jumped as much as 20.4 percent on Monday after the company received takeover proposal from Indonesia-based CT Corp.'}}

Is there specific setup that I need to follow?

reckart commented 5 years ago

When I run your Python sample against one of our indexes, I get back a slightly different structure:

{'_index': 'gigaword', '_type': 'texts', '_id': 'NYT_ENG_2000', '_score': 1.0, '_source': {'doc': {'text': "..."}, 'metadata': {'id': 'NYT_ENG_2000', 'language': 'en', 'source': '', 'timestamp': 'Tue Mar 05 14:45:10 CET 2019', 'uri': ''}}}

In particular there is a doc JSON element around the text - and I think that is why you do not get back any results in your case, because INCEpTION's query code right now explicitly requests the field doc.text:

        highlightNode.putPOJO("fields", mapper.createObjectNode()
            .putPOJO("doc.text", emptyNode));
reckart commented 5 years ago

I guess that might be an additional setting we should add to the configuration?

fer-git commented 5 years ago

I just modify the document structure in ElasticSearch from:

{"text":...}

into:

{"doc": {"text": ...}}

I can confirm I get the result from query. I would suggest you to add documentation on the recommended document structure / make it more flexible on which field should be searched.

Thanks for quick response.

reckart commented 5 years ago

Yep, we'll add this to the docs. Thanks for prodding us :)