Closed aridyckovsky closed 2 years ago
FWIW @psokolhessner I'm not 100% sure what form we want these enumerations in part 2 of the Solution in this ticket. The way I describe the potential .md
file, it mostly just makes the key renderable on GitHub that we can work with, but it does not yet demonstrate usefulness toward analysis (i.e., as a .txt
file that can be parsed with key-value pairs).
To some extent, I think the best option could be nested objects of key-value pairs, such as JSON format:
{
"questions": {
"1": {
"a": "Never",
"b": "Monthly or less",
...
},
"2": {
...
},
...
}
}
I haven't worked w/ JSON files before. Possible to get a .md
to eat/display a .json
or parts thereof for consistency/clarity/lack-of-repetition?
{
"q1": {
"prompt": "How often do you consume alcoholic beverages?",
"options": {
"a": "Never",
"b": "Monthly or less",
...
}
},
"q2": {
...
},
...
}
}
I think we can keep solution Part 1 in the original feature request. Let's stick with the CSV data where columns are per-question and values are the character responses, e.g., a
. We need not include quotes in the CSV, since we'll read in the data as characters. For example:
id | q1 | q2 | ... |
---|---|---|---|
CSN001 | a | c | ... |
CSN002 | a | b | ... |
... | ... | ... | ... |
As for Part 2, the enumeration key, we should use a JSON key as demonstrated in previous comment https://github.com/sokolhessnerlab/itrackvalr/issues/9#issuecomment-805132605. This file should be saved as a JSON file, such as suq-key.json
For example:
{
"q1": {
"prompt": "How often do you consume alcoholic beverages?",
"options": {
"a": "Never",
"b": "Monthly or less",
...
}
},
"q2": {
"prompt": "How often do you consume alcoholic beverages?",
"options": {
...
}
},
...
}
}
Including thread from Slack (dated May 24, 2021):
Questions from @miacudahy:
For some participants they answered that they have never smoked cannabis but then answered Q5 "How many times a day" with letter A (5 times or less), should I put there answers in under Q5 as they are circled or leave it blank? Participants (21,25,28,31,34,35-42)
Also, some participants who answered they have never smoked cannabis (Q4) they did not answer Q5, should I leave that blank or put something like n/a? Participants (5,7,10,16,19,29)
Response:
Interesting, these are good questions. It sounds like in both cases, the participants all answered “never smoked cannabis” to Q4. I think that implies Q5 might have been missing a “0 times” option for participants who thought “5 times or less” didn’t describe their experience. Let’s go ahead and record the data as best we can:
- If they responded A “5 times or less”, record A.
- If they didn’t answer Q5 at all, record NA.
This way we keep a thorough record of what participants actually did for our raw data, and we can figure out how to interpret it during analysis. How does this plan sound to you?
@miacudahy: the "how many mg" cannabis question for the SUQ should have had a '0' or "I have never used cannabis" option, but did not. That makes either no answer or "A" "correct" answers in the sense that, strictly speaking, if I've never used cannabis, then I use 5mg or less. So exactly as @aridyckovsky said, mark them as entered, and later we'll have to deal with this oddity about the SUQ! Thanks @miacudahy
Data file is now complete with desired column headings (e.g. id
, q1
, etc).
SUQ JSON key file is now present in the same directory as the SUQ CSV data file (CSN/data/raw/questionnaire/
) on the S drive. Assuming I haven't botched the JSON, this closes the issue.
Problem
We need a collection of data extracted from CSN SUQ scans.
Solution
id
(i.e., CSN001).An SUQ key is requested as a
.md
file that includes a file heading with description, plus a numbered list of questions and answers enumerations. Such a file may look like: