sokolhessnerlab / itrackvalr

Toolkit with a built-in pipeline infrastructure to integrate and analyze eye-tracking, behavioral, and self-report data in R.
MIT License
1 stars 0 forks source link

Extract SUQ answers and create enumeration key #9

Closed aridyckovsky closed 2 years ago

aridyckovsky commented 3 years ago

Problem

We need a collection of data extracted from CSN SUQ scans.

Solution

  1. The following items are requested from the SUQ scans, entered into a CSV with the following requirements:
    • The first column has rows for the participant's unique string id (i.e., CSN001).
    • Each subsequent column should have as values the enumerated responses for questions 1 through 6, i.e., the second column of CSV is for question 1 ("How often do you consume alcoholic beverages?") and the values are character values "a", "b", .... Check off the following when a given column and value are filled in for each participant.
      • [x] Question 1
      • [x] Question 2
      • [x] Question 3
      • [x] Question 4
      • [x] Question 5
      • [x] Question 6
  2. An SUQ key is requested as a .md file that includes a file heading with description, plus a numbered list of questions and answers enumerations. Such a file may look like:

    # SUQ question-answer key
    
    This is a key for SUQ.
    
    1. How often do you consume alcoholic beverages?
        a. Never
        b. Monthly or less
        c. ...
        ...
    
    2. How many alcoholic beverages do ...
aridyckovsky commented 3 years ago

FWIW @psokolhessner I'm not 100% sure what form we want these enumerations in part 2 of the Solution in this ticket. The way I describe the potential .md file, it mostly just makes the key renderable on GitHub that we can work with, but it does not yet demonstrate usefulness toward analysis (i.e., as a .txt file that can be parsed with key-value pairs).

To some extent, I think the best option could be nested objects of key-value pairs, such as JSON format:

{
  "questions": {
    "1": {
      "a": "Never",
      "b": "Monthly or less",
      ...
    },
    "2": {
      ...
    },
    ...
  }
}
psokolhessner commented 3 years ago

I haven't worked w/ JSON files before. Possible to get a .md to eat/display a .json or parts thereof for consistency/clarity/lack-of-repetition?

psokolhessner commented 3 years ago
{
  "q1": {
    "prompt": "How often do you consume alcoholic beverages?",
    "options": {
      "a": "Never",
      "b": "Monthly or less",
      ...
                     }
     },
   "q2": {
      ...
    },
    ...
  }
}
aridyckovsky commented 3 years ago

Resummarizing this issue's solution

I think we can keep solution Part 1 in the original feature request. Let's stick with the CSV data where columns are per-question and values are the character responses, e.g., a. We need not include quotes in the CSV, since we'll read in the data as characters. For example:

id q1 q2 ...
CSN001 a c ...
CSN002 a b ...
... ... ... ...

As for Part 2, the enumeration key, we should use a JSON key as demonstrated in previous comment https://github.com/sokolhessnerlab/itrackvalr/issues/9#issuecomment-805132605. This file should be saved as a JSON file, such as suq-key.json For example:

{
  "q1": {
    "prompt": "How often do you consume alcoholic beverages?",
    "options": {
      "a": "Never",
      "b": "Monthly or less",
      ...
      }
   },
   "q2": {
      "prompt": "How often do you consume alcoholic beverages?",
      "options": {
        ...
      }
    },
    ...
  }
}
aridyckovsky commented 3 years ago

Including thread from Slack (dated May 24, 2021):

Questions from @miacudahy:

For some participants they answered that they have never smoked cannabis but then answered Q5 "How many times a day" with letter A (5 times or less), should I put there answers in under Q5 as they are circled or leave it blank? Participants (21,25,28,31,34,35-42)

Also, some participants who answered they have never smoked cannabis (Q4) they did not answer Q5, should I leave that blank or put something like n/a? Participants (5,7,10,16,19,29)

Response:

Interesting, these are good questions. It sounds like in both cases, the participants all answered “never smoked cannabis” to Q4. I think that implies Q5 might have been missing a “0 times” option for participants who thought “5 times or less” didn’t describe their experience. Let’s go ahead and record the data as best we can:

  • If they responded A “5 times or less”, record A.
  • If they didn’t answer Q5 at all, record NA.

This way we keep a thorough record of what participants actually did for our raw data, and we can figure out how to interpret it during analysis. How does this plan sound to you?

psokolhessner commented 3 years ago

@miacudahy: the "how many mg" cannabis question for the SUQ should have had a '0' or "I have never used cannabis" option, but did not. That makes either no answer or "A" "correct" answers in the sense that, strictly speaking, if I've never used cannabis, then I use 5mg or less. So exactly as @aridyckovsky said, mark them as entered, and later we'll have to deal with this oddity about the SUQ! Thanks @miacudahy

psokolhessner commented 2 years ago

Data file is now complete with desired column headings (e.g. id, q1, etc).

psokolhessner commented 2 years ago

SUQ JSON key file is now present in the same directory as the SUQ CSV data file (CSN/data/raw/questionnaire/) on the S drive. Assuming I haven't botched the JSON, this closes the issue.