Processing documents - Githubissues

User has set the desired number of documents to be processed as n for k datasets

for each dataset in k datasets:
   - scan hbase table <project_name>_catalog for n documents
   - update date_processed and status for those n documents in the table <project_name>_catalog
   - Add n documents to the _in topic for them to be processed by etk
   - depending on the status as reported by etk and/or sandpaper, update/insert row in the table etk_status for n documents

usc-isi-i2 / dig-etl-engine

Processing documents #256