Closed cassws closed 4 years ago
problem: if Google Script autogenerates pids, could accidentally change a pid that has already gone public, which would break a URL
potential solution: in manual curation of data, curator can clean data up a bit (standardize free text stuff, for example, and then manually assign a pid. Then, script can use pid as indicator that the record is ready to go and start cleaning that record.
question: should manual curation happen in a separate tab (or even sheet) from the one populated by the Google Form? if so, should there be a combined cleaned tab, or should there be a separate cleaned tab for each collection? could end up with three curated tabs (datavis, datasets, other) and then three more tabs that have been cleaned by the script and safe lists. really would rather not have the script overwriting stuff, and ideally would not have the curator overwriting the original form submission either.
Current solution: one step of manually cleaning to add pid to approved examples, standardize free-text responses, and manually add pipes to free-text lists. Google Sheet script then does automatic cleaning for lists made from multiple-choice submission questions and splits the submissions into separate tabs for each collection. Can then export tab as .csv, save .csv into repo's data-tools directory, and run the python script on the .csvs to generate json.
as per #63