Open hhm970 opened 5 months ago
Extraction: We will use Google Forms for the daily survey, where data will be stored in Google Sheets, and extracted via the Google Sheets API
Transform: Input data into pandas
dataframe, performing any necessary data-cleaning in the process
Load: A meticulous process of ensuring all data goes into their corresponding table
Transform process needs hashing functionality, due to security and privacy issues with having the raw email address
Description
Our user input into the database won't be all tickboxes; some of the entries will involve text (eg. the user's emotions). We will need to clean the inputs on user emotions.
Possible obstacles include:
fuzzy
string matching, or adict
mapping to solve this (or both?)Required Files
./pipeline
User Story
As an engineer, I need to ensure that no values repeat themselves in the database, so that the database remains in 3rd normal form.