Open hhm970 opened 2 weeks ago
Extraction: We will use Google Forms for the daily survey, where data will be stored in Google Sheets, and extracted via the Google Sheets API
Transform: Input data into pandas
dataframe, performing any necessary data-cleaning in the process
Load: A meticulous process of ensuring all data goes into their corresponding table
Description
Our user input into the database won't be all tickboxes; some of the entries will involve text (eg. the user's emotions). We will need to clean the inputs on user emotions.
Possible obstacles include:
fuzzy
string matching, or adict
mapping to solve this (or both?)Required Files
./pipeline
User Story
As an engineer, I need to ensure that no values repeat themselves in the database, so that the database remains in 3rd normal form.