opensemanticsearch / open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
https://opensemanticsearch.org
GNU General Public License v3.0
941 stars 164 forks source link

"Import Status: Running file import" stuck. #482

Open Pooja1905 opened 4 weeks ago

Pooja1905 commented 4 weeks ago

I am facing similar issue as the person who opened this issue https://github.com/opensemanticsearch/open-semantic-search/issues/282. Can anyone guide me on this? I have checked error logs of solr, syslogs etc and there doesn't seem to be any errors as such. The CPU utilization of my EC2 instance seems to be quite busy and not idle. I have deleted the indexes/indices and recreated them a couple of times, but there is no change in the total number " Running file imports ..." stats. I have 95-100 gb of data (mixed media - pdfs, images, videos, audios, pngs, csv etc)

I have left it alone for 2 days now and it hasn't made a dent in the numbers, however the cpu utilization is 80-95 % on average.

image

I have installed opensemantic search on ec2 instance with 16gb ram, 400 gb storage, debian. JVM heap - ~4gb.

Let me know if I need to provide any other details.

I am stumped, any direction/help is much appreciated.