mattnmorgan / ECU-19-Redis

NSF Grant project for for the CSCI department chair
0 stars 0 forks source link

ECU NSF Project

Title: ECU-19-Redis
Author: Matthew Morgan
Supervisor: Venkat Gudivada
Date: Started January 2019 (updated 26 March 2019)

Description

This project is being developed as a research project for the NSF grant provided to East Carolina University. This project is overseen by Dr. Venkat Gudivada, and being worked upon by Matthew Morgan under his guidance. The project involves database management systems such as Redis, PostgreSQL, and utilities such as ElasticSearch.

Redis and Boolean Retrieval

This task primarily centered around the creation of a 3-master, 3-worker Redis cluster, hosted graciously by ECU, and the generation of Redis documents using a subset of the Gutenberg corpora, after some pre-processing to remove extraneous data. Sample queries were run on this cluster, and tests performed to ensure data integrity on the failover of a master node.

Software:

Apache Lucene and Sample Queries

Subtask 01: Sample Queries on Gutenberg

This task primarily centered around the creation of an Apache Lucene program that could generate documents in field-value pairs, executed on a smaller subset of the Gutenberg Corpora formatted similarly to the Cranfield corpora. It was programmed using Java, and required the execution of a variety of queries, inclusive of, but not limited to boolean, term, and disjunction max queries. (A program from the summer of 2018 was utilized to provide a start on this task.)

Software:

Subtask 02: Sample Queries on Bibliography

This task primarily centered around the modification of the aforementioned Lucene program, generating documents instead using bibliography fields from a provided document corpora. Before this could be done, cleaning of the bibliography was necessary using a script from GitHub.

Software:

Computer Science Corpora and MongoDB, ElasticSearch, and Neo4J

...

Software:

Other Resources

Notes

If sudo pip install doesn't work for install, you may try python3 -m pip install <package> instead. To install the needed packages locally, simply append the --user tag to the installation command.