Open saggu opened 5 years ago
Implemented
Adding desired
to hbase as well. The updated schema looks like this
rowid: design- <project_name>_<dataset>. Allows us to quickly fetch the total number of documents in each project and for each dataset
total_docs: number of documents in <dataset> in <project_name>
desired: number of desired docs in elasticsearch for <dataset> in <project_name>
We need to track total docs added to kafka while etk processing, updated schema:
rowid: design- <project_name>_<dataset>. Allows us to quickly fetch the total number of documents in each project and for each dataset
total_docs: number of documents in <dataset> in <project_name>
desired: number of desired docs in elasticsearch for <dataset> in <project_name>
added_docs: total number of docs added to kakfa for <dataset> in <project_name>
Create hbase table
dataset_view
, if it does not exist.This will be used for
TLD View
in mydig frontend.Schema:
Note : We are not going to track desired number of docs. It'll only exist as front end concept and based on what the user has entered, that many documents will be fetched from hbase and processed.