Creating a new project - Githubissues

saggu commented 5 years ago

When a new project in created in myDIG, the following should happen at the backend.

Create a new hbase table for the project: `<project_name>_catalog`.

Catalog table's schema

  rowid: the id of the row in hbase, designed as: `<dataset>_doc_id`. This will allow to quickly fetch all         docs under that dataset.
  document: the cdr json document
  date_added: date when this document was added to the project
  date_processed: date when this document was scheduled to be processed by etk
  file_name: the user uploaded file this json belongs to
  dataset: dataset specified by user for this document
  status: NEW - 0 and SCHEDULED To Be Processed - 1 
  identifier: the id of the document

Create a new hbase table for the project for storing etk status for docs in this project: `<project_name>_etk_status`.

Schema:

rowid: row id for the row, design: <project_name>_<dataset>_doc_id. Allows us to quickly fetch all the documents for given project and under a specific dataset.
date_last_processed: date when the document was processed by etk(successfully or not)
status: 0 - etk error
        1 - sandpaper error
        2 - successfully processed by etk and send to the out kafka topic
added_by: dig_etl_engine or housekeeping

saggu commented 5 years ago

Added code to create the table when a project is created. Closing

saggu commented 5 years ago

Updated the hbase table schema, each project will have 2 tables, catalog and etk_status. Keeping etk_status separate helps in removing the tables easier when deleting a project

saggu commented 5 years ago

Implemented

usc-isi-i2 / dig-etl-engine

Creating a new project #253

Create a new hbase table for the project: `<project_name>_catalog`.

Catalog table's schema

Create a new hbase table for the project for storing etk status for docs in this project: `<project_name>_etk_status`.

Schema:

usc-isi-i2 / dig-etl-engine

Creating a new project #253

Create a new hbase table for the project: <project_name>_catalog.

Catalog table's schema

Create a new hbase table for the project for storing etk status for docs in this project: <project_name>_etk_status.

Schema:

Create a new hbase table for the project: `<project_name>_catalog`.

Create a new hbase table for the project for storing etk status for docs in this project: `<project_name>_etk_status`.