Title: ECU-19-Redis
Author: Matthew Morgan
Supervisor: Venkat Gudivada
Date: Started January 2019 (updated 26 March 2019)
This project is being developed as a research project for the NSF grant provided to East Carolina University. This project is overseen by Dr. Venkat Gudivada, and being worked upon by Matthew Morgan under his guidance. The project involves database management systems such as Redis, PostgreSQL, and utilities such as ElasticSearch.
This task primarily centered around the creation of a 3-master, 3-worker Redis cluster, hosted graciously by ECU, and the generation of Redis documents using a subset of the Gutenberg corpora, after some pre-processing to remove extraneous data. Sample queries were run on this cluster, and tests performed to ensure data integrity on the failover of a master node.
Software:
sudo pip install redis-py-cluster
, providing the ability to interface with a redis clustersudo pip install redis
, but makes changes for a clusterThis task primarily centered around the creation of an Apache Lucene program that could generate documents in field-value pairs, executed on a smaller subset of the Gutenberg Corpora formatted similarly to the Cranfield corpora. It was programmed using Java, and required the execution of a variety of queries, inclusive of, but not limited to boolean, term, and disjunction max queries. (A program from the summer of 2018 was utilized to provide a start on this task.)
Software:
core
and queryparser
libraries were utilized from the binary (JARs)This task primarily centered around the modification of the aforementioned Lucene program, generating documents instead using bibliography fields from a provided document corpora. Before this could be done, cleaning of the bibliography was necessary using a script from GitHub.
Software:
core
and queryparser
binariesclean_bib
project on GitHub, available at https://github.com/ZacCat/clean_bibsudo pip install bibtexparser
is a dependency...
Software:
sudo pip install beautifulsoup4
for web-scrapingsudo pip install requests
for fetching files from websitesIf sudo pip install
doesn't work for install, you may try python3 -m pip install <package>
instead. To install the needed packages locally, simply append the --user
tag to the installation command.