tattle-made / services-infrastructure

0 stars 1 forks source link

Host a single node elasticsearch reliably #4

Open dennyabrain opened 2 years ago

dennyabrain commented 2 years ago
  1. How to run a single node elasticsearch on a linux machine
  2. Configure automatic backup and restore features
dennyabrain commented 2 years ago

Original discussion for reference https://github.com/tattle-made/kosh-v2/discussions/8

d80ep08th commented 2 years ago

Run a single node elastic-search on a linux machine

First, make sure you have installed Docker and Docker compose

Uninstall old versions

$  sudo apt-get remove docker docker-engine docker.io containerd runc

Set up the repository

#Update the apt package index and install packages to allow apt to use a repository over HTTPS:
$ sudo apt-get update

$ sudo apt-get install \
     ca-certificates \
     curl \
     gnupg \
     lsb-release

#Add Docker’s official GPG key:
$  curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

#Use the following command to set up the stable repository
$  echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Install Docker Engine

$ sudo apt-get update

$ sudo apt-get install docker-ce docker-ce-cli containerd.io

#Verify that Docker Engine is installed correctly by running the hello-world image
$  sudo docker run hello-world

Install Docker Compose

#Run this command to download the current stable release of Docker Compose:
$ sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

#Apply executable permissions to the binary
$ sudo chmod +x /usr/local/bin/docker-compose

#Check if docker-compose has been installed
$  docker-compose --version

Second, using docker install elasticsearch

Pulling the image

#Obtaining Elasticsearch for Docker is as simple as issuing a docker pull command against the Elastic Docker registry.
#Using version 7.6.1, because the current version of Kosh infrastructure uses that version

$ docker pull docker.elastic.co/elasticsearch/elasticsearch:7.6.1

Run a single node elastic search

#interrupt this using Ctrl+C
$ docker run -p 127.0.0.1:9200:9200 -p 127.0.0.1:9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.6.1

On another terminal/console, use the following command to check if elastic search docker container is running or not

$ docker container ls

Goto ==> localhost:9200

Start Feluda as a single node elastic search daemon service on a linux machine

Clone the tattle api repo

$ git clone https://github.com/tattle-made/tattle-api.git

Replace the contents of docker-compose.yml in the repository with the following :

version: "3.5"

services:
  store:
    container_name: systemd_es
    image: docker.elastic.co/elasticsearch/elasticsearch:7.6.1
    volumes:
      - ./.docker/es/data:/usr/share/elasticsearch/data
    ports:
      - "9300:9300"
      - "9200:9200"
    environment:
      - xpack.security.enabled=false
      - discovery.type=single-node
      - http.cors.enabled=true
      - http.cors.allow-origin=http://localhost:1358,http://127.0.0.1:1358
      - http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization
      - http.cors.allow-credentials=true

    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    cap_add:
      - IPC_LOCK

  queue:
    image: rabbitmq:3.8.7-management
    container_name: systemd_rabbitmq
    hostname: rabbit
    volumes:
      - ./.docker/rabbitmq/data/:/var/lib/rabbitmq/
      - ./.docker/rabbitmq/logs/:/var/log/rabbitmq/
    environment:
      RABBITMQ_ERLANG_COOKIE: "secret-cookie"
      RABBITMQ_DEFAULT_USER: "admin"
      RABBITMQ_DEFAULT_PASS: "Admin123"
    ports:
      - 5672:5672
      - 15672:15672

  api:
    container_name: systemd_feluda_api
    build:
      context: ./src/api
      dockerfile: Dockerfile
      target: debug
    volumes:
      - ./src/api:/app
    env_file: ./src/api/development.env
    ports:
      - 7000:7000
      - 5678:5678
    command: tail -f /dev/null
    #depends_on:
    #  store:
    #    condition: service_started
    #  queue:
    #    condition: service_started

It is almost similar to the file in https://github.com/tattle-made/kosh-v2/discussions/8#discussioncomment-1811845

In the repository, in /src/api directory, add the file development.env with the following content :

MQ_USERNAME=admin  
MQ_PASSWORD=Admin123
MQ_HOST=rabbitmq
AWS_ACCESS_KEY_ID=XXX  
AWS_SECRET_ACCESS_KEY=XXXX
AWS_SECRET_ACCESS_KEY_ID=XXXX
AWS_BUCKET=config.tattle.co.in  
S3_CREDENTIALS_PATH=google-api/tattle-api-google.json
GOOGLE_APPLICATION_CREDENTIALS="credentials.json"  
ES_USERNAME=XXXXX  
ES_PASSWORD=XXXXX  
ES_HOST=es  
KOSH_API=http://kosh_api:8000/

chown

$ sudo chown -R 1000:1000 tattle-api/.docker/es/data

Bring up the containers


$ docker-compose up

#interrupt this using Ctrl+C
# This will bring up the following containers:

#Elasticsearch : Used to store searchable representations of multilingual text, images and videos.

#RabbitMQ : Used as a Job Queue to queue up long indexing jobs.

#Search Indexer : A RabbitMQ consumer that receives any new jobs that are added to the queue and processes them.

#Search Server : A public REST API to index new media and provide additional public APIs to interact with this service.

Write a daemon service

Go to /etc/systemd/system . Write a file elastisearch-feluda.service:

[Unit]
Description=Elastic Search Feluda Daemon
After=docker.service
Requires=docker.service

[Service]
Restart=always
User=root
Group= #name of user
# Shutdown container (if running) when unit is stopped
ExecStartPre=/usr/bin/docker-compose -f /path/to/tattle-api/docker-compose.yml down
# Start container when unit is started
ExecStart=/usr/bin/docker-compose -f /path/to/tattle-api/docker-compose.yml up
# Stop container when unit is stopped
ExecStop=/usr/bin/docker-compose -f /path/to/tattle-api/docker-compose.yml down

[Install]
WantedBy=multi-user.target

Using this service , the elasticsearch single node feluda containers will start on their own if the machine restarts .

Enable the elasticsearch service

$ sudo systemctl enable elasticsearch-feluda.service

$ sudo systemctl enable elasticsearch-feluda.service
#sudo systemctl stop elasticsearch-feluda.service
#sudo systemctl disable elasticsearch-feluda.service

$ sudo systemctl status elasticsearch-feluda.service

Configure Automatic Backup and Restore feature

We use Boto3

The AWS SDK for Python (Boto3) to create, configure, and manage AWS services, such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

$ pip install boto3 #install python 3.6 or later

Backup Data

backup.py:

import boto3
from botocore.client import Config
from datetime import datetime
import shutil

ACCESS_KEY_ID = ''
ACCESS_SECRET_KEY = ''
BUCKET_NAME = ''

path_to_backup = ""

s3 = boto3.resource(
    's3',
    aws_access_key_id=ACCESS_KEY_ID,
    aws_secret_access_key=ACCESS_SECRET_KEY,
    config=Config(signature_version='s3v4')
)

time_stamp = datetime.now().strftime("%Y_%m_%d-%I_%M_%S_%p")

backup_filename = "backup_" + time_stamp

shutil.make_archive('/path/to/backup/'+backup_filename,'zip',path_to_backup)
key_path = "kosh-es/"+backup_filename+".zip"

s3.meta.client.upload_file(backup_filename, BUCKET_NAME, key_path)

Run as $python3 backup.py

Restore Data

restore.py:

import boto3
from botocore.client import Config

ACCESS_KEY_ID = ''
ACCESS_SECRET_KEY = ''
BUCKET_NAME = ''

path_to_restore =  ""
key_path = ""

s3 = boto3.resource(
    's3',
    aws_access_key_id=ACCESS_KEY_ID,
    aws_secret_access_key=ACCESS_SECRET_KEY,
    config=Config(signature_version='s3v4')
)
s3.meta.client.download_file(BUCKET_NAME, key_path, path_to_restore)

Run as $python3 restore.py

What's Left?

  1. In Restore Data, restore.py takes in the name of the file to be restored, statically . Whereas I want to make it dynamic by letting it input the name of the file to restored. Like : python3 restore.py filename

  2. In Backup Data, we zip the file/folder to be backed up before sending it over to the Amazon S3 bucket in backup.py. So corresponding to that, restore.py requires that it unzips the file to be restored after it downloads the file from the bucket.