mattnmorgan / ECU-19-Redis

NSF Grant project for for the CSCI department chair
0 stars 0 forks source link

Generate Retrieval Indices #1

Closed mattnmorgan closed 5 years ago

mattnmorgan commented 5 years ago

[Removed original tasklist for better progress tracking]

Edit 01: (02-18-2019) After discussion with Gudivada, the positional index will presently be disregarded in favor of expanding the boolean retrieval capacities of the application. This includes the following tasks:

Edit 02: (02-22-2019) After a bit of research about how to force Redis to send documents to nodes using hashtagging, and implementation of BR querying, issues have arisen:

mattnmorgan commented 5 years ago

As of the present moment, boolean retrieval indexing is supported by the application, and meta-data for each corpora document (title, author, language(s)) is indexed and sent to the Redis server successfully. Querying of this information is also possible. A positional index has been setup, but data is not being migrated yet to the server (thus querying is also not possible on the positional index yet).

mattnmorgan commented 5 years ago

There is a commit (d58c505e83f6547bfaeb25ef28a1bd5055d398bd) presently that would allow pushing of positional indices to the Redis server, but an error occurs specifying that the AOF file on the cluster could not be updated with an input/output error. Connecting to the cluster nodes using SSH is also not possible due to ssh_authentication error. Jason @ ECU was emailed about this to seek assistance.

screen shot 2019-02-18 at 1 28 52 pm

mattnmorgan commented 5 years ago

Commit e97c2b89e8c50360460168288edf930a381f52a1 makes mention of a modification to redis.md that explains that data can be 'forced' to hash to a certain slot by enclosing a portion of the key in braces. From the file itself:

Keys can be forced to evaluate to a specific slot by enclosing a portion of the key within {}. For example, {0}key would evaluate to slot 13907, since {0} would be used for the evaluation. In a default 3-master, 3-worker cluster, this would send data to the third master - aka the one that holds the last 5,400 slots.

This information is available at https://redis.io/commands/cluster-keyslot, with the feature being referenced as 'hash-tagging'.

mattnmorgan commented 5 years ago

Implementation of and, or, and not boolean retrieval queries was progressed. Sample queries were tested, including [blue|green life] !park, !dark|[orange ball], and !dog !cat !animal. The tertiary does not provide the correct result; however, the former 2 provide an exact match to handwritten evaluation.

Syntax of queries is documented in query.py, and a new document is sent to each Redis node - docset for usage in computing the not of terms. For some reason, 'cls' crashes the program when used as a query, and this is documented by issue #5