ropensci / ozunconf18

repository for the rOpenSci ozunconference 2018
31 stars 7 forks source link

🔍Parser for Qualtrics survey 📋 files (QSF) to 2D Metadata #5

Open Lingtax opened 5 years ago

Lingtax commented 5 years ago

Human and machine readable metadata is important to making data open.

Users of qualtrics (a widely used online survey platform) have access to JSON metadata files (labelled QSF), but due to the nested layout of these, they are not human friendly.

I think there is scope to develop functions (for integration into existing qualtRics package, or standalone) to read critical information from these files and reorganise into a flat, 2D structure, streamlining and therefore promoting the production of these comprehensive metadata files.

Happy to provide examples.

njtierney commented 5 years ago

Sounds great, @Lingtax ! I haven't worked much with qualtrics before, although I have answered many surveys from them! I'd be interested in seeing some examples :)

sadian commented 5 years ago

@Lingtax

I really like this idea of making the existing packages more user friendly for JSON to R conversion.

Thanks for the suggestion!

ekothe commented 5 years ago

+1 @Lingtax

Also, given that Qualtrics also allows for survey creation via import of .QSF, this would be a useful step for being able to create new surveys programmatically. I'm wary of scope creep but it would be nice if the solutions we come up with are mindful of that possible option in the future.

Lingtax commented 5 years ago

@ekothe, we should build a survey with every type of item (full test) and a survey with one block and item (minimal example) ahead of the unconf to use as resources if this gets up.

ekothe commented 5 years ago

Qualtrics QFSs.zip

This has a minimal example with just one question and a maximal example with all question types and multiple blocks. Survey flow, randomisation, loop and merge, parsed text etc. is not demonstrated.

Lingtax commented 5 years ago

Do you have the preview link to the survey, as well?

ekothe commented 5 years ago

https://researchsurveys.deakin.edu.au/jfe/form/SV_3Iw83SNMJNrSZYp <- Maximal https://researchsurveys.deakin.edu.au/jfe/form/SV_0qvLkxT2WUCDpUp <- Minimal

Note that if you complete the maximal as a participant your metadata and IP address will be collected and I'll have access to both, but I promise not to use them for evil.

ekothe commented 5 years ago

There are some things in this repo (https://github.com/emmamorgan-tufts/QualtricsTools) that are useful but it does something things I don't think you'd think are in scope @Lingtax and skips some things you do want.

kcf-jackson commented 5 years ago

@Lingtax

I like the idea of developing some standalone parsing functions. It seems "parser combinator", which is a family of high-order functions to build parser, may be of interest here.

The class of functions in a combinator look something like the following.

Some examples of parser generated using the combinator:

digits <- any_of(as.character(0:9))
lower <- any_of(letters)
upper <- any_of(LETTERS)
all_letters <- or_else(lower, upper)
whitespace <- one_or_more(" ")
special_char <- any_of(c("_", "-", "=", ".", "/". "{", "}"))   # non-exhaustive
symbols <- choice(all_letters, whitespace, digits, special_char)

Then the use case would look like:

# Use case 1
if_condition <- between("if (", zero_or_more(symbols), ")")

input_str <- "if (i == 1) { ... }"
if_condition(input_str)   # should return list("i==1", "{ ... }")

# Use case 2
before_dollar <- string("Price: $")
dollar <- and_then(zero_or_one(and_then(digits, "."), one_or_more(digits))
price_tag_parser <- xthen(before_dollar, dollar)

input_str <- "Price: $3.95"
price_tag_parser(input_str)  # should return list("3.95", "")

References

  1. Understanding parser combinators: a deep dive - Scott Wlaschin
  2. Higher-Order Functions for Parsing
  3. Monadic Parser Combinators
  4. An R implementation.

(2 and 3 are key references, but I find them a bit hard to read. 1 explains the concept using F#; I have absolutely no idea about F#, but the speaker explains the concepts so well that I find it quite easy to follow. 4 is a R implementation )