melsod / ManyBabies1-Qualtrics

Survey data for MB1
0 stars 1 forks source link

Labs with duplicate entries #11

Open mekline opened 6 years ago

mekline commented 6 years ago

The major data cleaning task at this stage is consolidating so that each lab has exactly one entry!! In most cases the most recent entry is the correct one, but sometimes e.g. a lab enters a questionnaire that's empty except for just a few new answers.

Here's the procedure I"m using, which uses the code on lines 181 of qualtrics-data-formatting.R

(1) View the next lab from the list on line 259, e.g.

# 8 cfnuofn 2

(2) In line 181, get and view that lab's lines View(filter(lab_questionnaire_raw, lab == 'PUT LABID HERE'))

(3) Inspect them and determine what's up! If there's just one line to keep, keep it (by dropping the others by date), and if any columns should be added from another entry, add them back! The baldwinuoregon shows the best way I've found of doing this - save aside the line you're going to drop, drop the lines from the main dataframe, then re-add from the set-aside line.

**

This will all need to be done for lab_debrief and lab_secondary as well!!

(I'm writing this out primarily to remind myself later, and plan to keep chugging along on this, but if anyone feels inspired to help, here's a less coding-intensive way to do it:

Using the instructions in (1) and (2), look at the duplicate lines. The date string in StartDate is a unique ID. If you can figure out what should be kept, record it in a commented line below each lab's id (starting at ~line 259) like this:

#labid # keep: '' # drop: '' # BUT keep these columns from '': #

So for instance, the entry for baldwinlaboregon (which originally had three entries) would look like this:

#baldwinlabuoregon # keep: '2018-02-13 23:03:07' # drop: '2018-01-15 18:39:07' and '2017-11-28 18:49:31' # BUT keep these columns from '2018-01-15 18:39:07': #Screening #Compensation etc.

mekline commented 6 years ago

(I forgot to say, to do this requires that you run the R script up to line 180, though you could also look at the files themselves in qualtrics_raw. But please let me know if anything isn't executing for you!)