Open rkwahile opened 6 years ago
conceived to address the need to simplify the - - experience. As it stands today, the process is overly complicated for the average customer requiring the accurate completion of forms that demand a high level of detail about the - -. Lack of accurate information in these forms can result in the package being detained or “- -” for further inspection. This directly affects the timeliness of the delivery and erodes customer trust and satisfaction.
Objective is to design and develop models that can be deployed in a scalable big data framework
for consumption across the enterprise both in batch mode, as well as in real time.
Hadoop : 1. Data Storage
Project Architecture : Ingest data from multiple sources.
Ingest > Bigdata Hive(Staging) > Cleaning and transformation (Refinement and Optimized layers)> Python scripts to remove stop words and extract meaningful desc. from data (Frequency, cleaned desc, Layman Term) > ElasticSearch (Indexing) > Microservices
Role : Design and develop Datapipelines for sources.
Feature-2 (F2)- Build Reusable Elastic Search components framework
F2-ARE-2-ELK - Build a framework to have data replication between two Elastic Search clusters which are on Active-Active multi cloud scenarios.
F2-ARE-2-ELK-User stories:
As an ELK developer, I need to find a way to write the data into two Active Active Multi-cloud ELK cluster in tactical approach and cheapest approach.
As an ELK developer, We need to investigate which version of elasticsearch goldengate supports because goldengate using native java client for elasticseach
As an ELK developer, Need to investigate goldengate for apache kafka, very likely we are going to send data from goldengate to kafka messaging queue
As an ELK developer, Do reasearch on kafka configuration for crossing multiple data centers
`