zbmed-semtec / mock_recommendation_system

GNU General Public License v3.0
0 stars 0 forks source link

Mock Recommendation System

This repository consists of code for a mock recommendation system designed for internal learning purposes to understand the STELLA architecture and its integration potential for recommendation systems in production and utilizes the RELISH corpus.

Table of Contents

  1. About the RELISH Corpus
  2. Technical Architecture
  3. Database Schema
  4. Developer Documentation

About the RELISH Corpus

The Mock Recommendation System leverages the RELISH corpus as its primary dataset. RELISH stands for "Resource for Evaluating Literature in Similarity Search and other Text-handling tasks". It is an expert-curated database specifically designed for benchmarking document similarity in biomedical literature.

The RELISH corpus comprises a expert curated collection of biomedical literature documents along with assessments of their similarity to other documents within the corpus. Each document is identified by its unique PubMed Identifier (PMID), a standardized identifier used for referencing biomedical literature.

Data Source and Characteristics

The version 1 of the RELISH corpus was retrieved from its corresponding FigShare record on the 24th of January 2022. It is structured as a JSON file containing pairs of PMIDs along with their relevance assessments concerning other PMIDs in the corpus.

Relevance Assessment

For each pair of PMIDs, the relevance is categorized into three main classes:

Input Data

Based on the RELISH JSON file, we created two input files:

These files were created using the code in this folder. Please refer to thisdocumentation in case you're interested in executing the scripts and creating the input files.

NOTE: For ease of use, the processed input data is hosted on Google Drive and is retrieved using a bash script within the Docker container.

Technical Architecture

The Mock Recommendation System is a Flask-based web application running inside Docker containers, with a SQLite3 database backend. The application provides a simple recommendation interface, styled using CSS and HTML.

The Docker setup uses Docker Compose to manage two containers: one for the web application and one for Nginx and Gunicorn.

architecture

Database Schema

The SQLite3 database schema consists of two tables: Publications and Recommendations.

Publications Table

The data from 'relish_text.jsonl' is used to populate this table.

Recommendations Table

The data from 'relish_recoms.jsonl' is used to populate this table.

schema

Developer Documentation

To deploy the Mock Recommendation System on an Ubuntu server, follow these steps:

1. Installing Docker

Clone the Mock Recommendation System repository from GitHub:

Using HTTP:
git clone https://github.com/zbmed-semtec/mock_recommendation_system.git
cd mock_recommendation_system
Using SSH:

Ensure you have set up SSH keys in your GitHub account.

git@github.com:zbmed-semtec/mock_recommendation_system.git
cd mock_recommendation_system

3. Creating a virtual environment

To create a virtual environment within your repository, run the following command:

python3 -m venv .venv 
source .venv/bin/activate

4. Building and Running the Docker Container

To build and start the containers, run the following command:

sudo docker compose up -d

To check if the containers are working as required, you could run the following command:

sudo docker container ps -a

5. Accessing the System

Once the system is up and running, you can access it using your web browser at the server's IP address.

Based on the server, please update the nginx.conf file to include the correct IP address:

server {

    listen 80;
    server_name <server's IP address>;
    client_max_body_size 200M;

    location / {
        proxy_pass http://web:5000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

6. Retrieving System logs

Docker handles the log files using volumes, the log directory and file will be created on your host machine. The log file will be named 'system.log'. To access the logs, navigate to the logs directory in your project folder on the host machine.