opensemanticsearch / open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
https://opensemanticsearch.org
GNU General Public License v3.0
941 stars 164 forks source link

Tags not removed upon emptying index and/or deleting whole tag out of Django Admin #432

Open dayo8 opened 2 years ago

dayo8 commented 2 years ago

Hi guys,

we've been running the latest version of OSS (open-semantic-search_22.03.04.deb) on a virtual Ubuntu server for a couple months and are experiencing a few problems, one of which makes it very hard to use for us.

Tagging: Once you tag a doc(either manually or automated by choosing a query that consists of word (1 word), you cannot untag it anymore. It used to get untagged at least when we emptied the index or deleted the index for the specific file. It used to work, but ever since we chose queries for tags that do not only contain folder path this has been happening (tested this 2 times now) We freshly installed whole program because of that error, but even on the newest version it keeps happening. I downloaded the Django db file and opened it on my laptop, but I can't find any tags or annotations in there anymore (because I deleted them of course, but I thought it might have not been deleted).

Is there any other place I could manually delete the tags from? I've searched the whole folders for hours but can't find anything, but I also don't really know where I should look. I wonder where OSS still gets the tags from.

There is another problem that might be connected to this: If we don't enable the setting config['force'] = True in etc/opensemanticsearch/connector-files, if you make changes to any document, it wouldn't recognize it. It will change the change date of the file, but the changes wouldn't show up. So naturally we always have that option enabled.

I would appreciate any kind of advice.