usc-isi-i2 / dig-etl-engine

Download DIG to run on your laptop or server.
http://usc-isi-i2.github.io/dig/
MIT License
101 stars 39 forks source link

Adding new documents to mydig #255

Open saggu opened 5 years ago

saggu commented 5 years ago
  1. When a user uploads a new file, json lines or csv or any format. Catalog the documents in the hbase table <project_name>_catalog as described in #253 . Initially the date_processed is empty and status is 0.

  2. If the uploaded file is a json lines file, catalog each individual json object, if its a csv or tsv or excel etc, create a wrapping json object and catalog the wrapping json object.

  3. Update the table dataset_view as described in #254 . This table will only updated when documents are either uploaded or deleted from myDIG.