Changes:
Fixes duplication error by loading in data differently in the keywordjob.
Insert a reportjob to return the most common keywords from a dataset
Update 1234.json into schema.json and uses it in the intakejob
cleanjob()
Remove the _SUCCESS file that generates
Add more cleaning regexes
Add a function that replaces the datasets word count with a more accurate one
Changes: Fixes duplication error by loading in data differently in the keywordjob. Insert a reportjob to return the most common keywords from a dataset Update 1234.json into schema.json and uses it in the intakejob
cleanjob() Remove the _SUCCESS file that generates Add more cleaning regexes Add a function that replaces the datasets word count with a more accurate one