gw-cs-sd / sd-2017-confidential-scalable-analytics

sd-2017-confidential-scaleable-analytics created by GitHub Classroom
0 stars 0 forks source link

Week 11: Integrated Java program, changed OCR API and redesigned DB. #3

Open faisalharbi opened 8 years ago

faisalharbi commented 8 years ago

This week I changed the API we were using to one that reads a PDF document and converts it into an HTML string (HPE Haven OnDemand View Document). I wrote a script to parse the HTML string, extract the relevant data, and ultimately write that information to a text file for the Java algorithm to read and process. The script also removes the private data from the HTML string, replaces it with arbitrary characters and prints it out to the screen. I also integrated the Java program that Sean put together into our UI and wrote a script that calls the Java file and puts the result into a text file. The database was also redesigned to account for tables for encrypted private data, uploaded documents and companies using the application.