jhelvy / surveydown

An attempt to build a markdown-based survey platform using Quarto & Shiny
12 stars 1 forks source link

Frameworks #1

Closed jhelvy closed 3 weeks ago

jhelvy commented 1 year ago

In building out this package, I am considering two quite different frameworks. This issue discusses them both.

Option 1

This is the original framework discussed in my blog post. It is a "disaggregated" framework in that the survey is defined by multiple Rmd / qmd files that are linked together with a _survey.yml file, and the survey questions are all defined in a _questions.yml file. A functioning prototype built on {shinysurveys} is available in this repo here.

Some benefits of this framework are that some of the "high-level" survey design elements are easily visible. For example, the overall survey flow logic is clear and easy to modify by just editing the _survey.yml file. Likewise, the survey questions are all centralized in the _questions.yml file, making them equally easy to quickly view and modify.

A downside to this framework is that the user has to open and edit multiple different files to get a fully-functioning survey. It is also not clear how the user might preview the rendered survey (other than simply compiling the survey and taking it themselves). Perhaps a preview() function could be made, or perhaps in the _survey.yml file there could be other elements that control things like having the survey in "preview" mode, as well as other global options, like CSS, themes, etc. This could be similar to how Quarto / RMarkdown websites work.

Option 2

In this option, the entire survey would be designed in a single .qmd file (I'm using Quarto over RMarkdown as a default for now). An example of this framework can be seen in this repo here, though this one is not yet functioning. In this framework, the .qmd file feels much more like editing a xaringan or Quarto presentation file in that all of the content is there in a single file. Questions could be defined here in code chunks using a simple function, e.g. q(), that take all the same arguments that would otherwise be defined in the _questions.yml file in option 1. Survey control logic could be defined using options just after the --- page breaks. For example, a skip option could be put in place to allow control over skipping.

This framework has the nice benefit of being able to edit and preview the entire survey from one place. There are fewer files, and the integration between markdown and code feels tighter. A simple yaml at the top of this file could also allow the user to specify global options like CSS, themes, etc. The ability to define questions in code chunks is also useful. For example, if the user wanted a select type question and wanted to include a series of values as options, they could define that series using R code, e.g. 1900:2023 (for "year of birth"). This could potentially be done in Option 1 too in the _questions.yml file if we allow for in-line code, e.g. `r 1900:2023`. But in general it feels like there would be more flexibility in defining the questions with this framework, such as the ability to read in an external file to define options.

A downside to this framework is that the handling of the survey control logic and the questions themselves are a little clumsy. It isn't as clearly obvious how and where things like skipping questions are being done as the user has to parse through the file looking for a skip option, and they also have to remember to set names for the pages that are involved in skipping, etc.

Discussion

Maybe there is a compromise between these two. I still in general prefer Option 1 as it feels easier to separate out the design of the different survey elements:

  1. General survey content (text, images, etc).
  2. Survey questions.
  3. Survey control logic.

Every survey has at least these 3 elements, and being able to separately define each feels convenient. The issue of previewing the results could probably be handled by just making a function, e.g. surveydown::preview(). A page argument could allow the preview to start at a given location in the survey, surveydown::preview(page = 'screener').

The main aspect I would want improved in this framework is to allow more flexibility in how question options are defined in _questions.yml. The inclusion of in-line code for options is a big step in the right direction here.

Another aspect that would need to be added is adding more options in the _survey.yml file. For example, some overall global options could be added along side the control logic, something like this:

name: "demo"
title: "Surveydown Demo"
css: style.css 
control:
  welcome.qmd
  screener.qmd
  skip:
    condition: age > 40
    distination: end_screen.qmd
  other.qmd
  end.qmd
  stop
  end_sreen.qmd
  stop
jdimeo commented 1 year ago

So we went "all in" on the "disaggregated" approach, which is both way more powerful but way less literate than the path you are on. Not saying one is better or worse, but ours facilitates a lot of modularity and re-use (at the expense of a top to bottom file as you describe). Here are some examples (totally fake, our real use case is Child Development Monitoring & Evaluation)

We have separate YAML files for response scales:

---
values:
  - value: 0
    label:
      key: min
      en-US: Incorrect
      sw-TZ: Kosa
      id-ID: Salah
  - value: 1
    label:
      key: max
      en-US: Correct
      sw-TZ: Sahihi
      id-ID: Benar

(note the ability to translate/localize the survey- we bidirectionally integrate with https://tolgee.io/)

which can then be included in other scales:

---
include:
  - "incorrect-correct"
values:
  - value: 2
    label:
      key: notask
      en-US: Did not ask
      sw-TZ: "Je, si kuuliza"
      id-ID: Tidak bertanya

Or variables, in entirety, mix n' match, or even "merged" (using numeric values or keys to override values).

id: travel_enjoyment
type: Input
name:
  en-US: Travel enjoyment
definition:
  prompt:
    en-US: How often do you travel?
    en-GB: How often do you go on holiday? # Contextualized English a form of "translation"
  scale:
    include:
      - "all-most-some-no-time" # Included scale from the scale library

A key concept in our framework is variants. So you can adjust the wording or the response scale for young kids vs. old kids vs. caregivers or by country (or other ways). This is also the way we capture relevance or skip logic... under a dynamic condition, disable this variable or groups of variables. Here's an example of a consent question that is worded for use with an enumerator/interviewer, but uses a variant to override the prompt if the given survey is for "Self-enumeration" - the respondent is using the tablet themselves.

---
id: base_consent
aliases:
  - "intro_informed_conse"
  - "verbal_consent"
type: Input
dataType: Binary
name:
  en-US: Informed Consent
definition:
  prompt:
    en-US: >
      Do you agree to participating in this interview and to share this information?
  notes:
    en-US: >
      If the response is yes, continue with the survey. If the response is no, finalize and exit the survey.
  scale:
    include:
      - "no-yes"
variants:
  - for: Self-Enumeration
    override:
      prompt:
        en-US: Are you willing to complete the survey?

or an example of the dynamic variant and/or skip condition:

variants:
  - when: travel_enjoyment.value != travel_enjoyment.scale.min # Not "none" - the scale option with key "min". Could also have done > 0
    override:
      enabled: false

and then you can pull this all together into a survey YAML:

id: helloworld
name: Hello World
blocks:
  - prompt:
      en-US: >
        This is a demonstration using a simple survey
  - name: 
      en-US: Hello World block
    elements:
      - variable: base_consent
      - variable: hw_name
      - variable: hw_age
      - variable: travel_enjoyment

Which again allows you to refer to and re-use variables across multiple surveys. So you can generate a whole "grid" of surveys- with an interviewer, for self-enumeration, for young kids/old kids, etc. which creates XLSForms which then load into Data Collection platforms and then we use their REST APIs to pipe the data back to our data warehouse.

I do strongly recommend using someone else's backend because of things like localization/language support, offline collection, case management, security, etc. So much more to discuss/share!

(note that Markdown is supported in all the en-US and other language bits... not shown in my examples, but the "down" part is still there ;-))

jdimeo commented 1 year ago

I'll drop one example of a calculated/derivative variable, that also demonstrates the ability for our end users to code unit tests into the YAML to help with data integrity and confidence in the research data! The expression language is a Java-script-y type language that is familiar to most, with some custom functions we define (like binarize - turning a number into 0 or 1 based on a threshold).

id: well_traveled
type: Outcome
dataType: Binary
name:
  en-US: Respondent is well-traveled
definition:
  expression: >
    any(
      // Visited >= 3 states
      binarize(setCount(travel.value), 3),
      // AND/OR
      all(
        // Visited >= 2 states where one of them was Colorado
        binarize(setCount(travel.value), 2),
        travel_co.value
      )
    )
  tests:
    - expect: ## Test 1 (>= 3 w/ CO)
        value: 1
      given:
        travel: [1, 2, 3, 4]
    - expect: ## Test 2 (>= 3 w/o CO)
        value: 1
      given:
        travel: [1, 2, 4]
    - expect: ## Test 3 (>= 2 w/ CO)
        value: 1
      given:
        travel: [1, 3]
    - expect: ## Test 4 (>= 2 w/o CO)
        value: 0
      given:
        travel: [1, 2]
    - expect: ## Test 5 (<= 2 w/o CO)
        value: 0
      given:
        travel: [1]
jhelvy commented 1 year ago

Okay wow, this is a great demo of how to handle lots of complexity in the "disaggregated" framework. I am also a little more partial to going more disaggregated than not for a lot of the same reasons you mentioned:

I still think though that much of the rest of the survey content could / should be generated as .Rmd / .qmd files, mostly because they are much easier to read and preview. For example, making a very text-heavy page like a consent form page would be much nicer in a consent.qmd file that could be rendered and previewed in a browser. It's also a little closer to how quarto websites work where each web page is a separate file that are all tied together in a YAML file that defines things like the overall website styling, menu / navigation items, etc. I think it's the best of both - use the .qmd files to edit text / image heavy page content, and use YAML files to control flow logic and question definitions.

jdimeo commented 1 year ago

Yeah, so for that part I do use the Maven site plugin and a Bootstrap theme to achieve a very similar effect to create a data dictionary. We now have over 2,000 variables, dozens of surveys, > 20 countries, thus over 76,000 files are generated as part of our static site: image

We also use Mermaid and the ability to introspect any expressions or inputs to wire up an ontology/lineage flow chart for every variable. image

But Quatro is definitely better if you're not in the Maven/JVM ecosystem already!

jdimeo commented 1 year ago

GitLab CI/CD runs all variable tests and regenerates every survey (in XLSForm or in MS Word "Survey guide" form for field use) on every commit. Here's an example of a survey's documentation page: (importantly, all variants have been applied and "includes" resolved so it's documenting the survey in its "final form")

image

jhelvy commented 1 year ago

Yeah this looks like a good solution for a specific context where lots of iterations of similar surveys are needed. I'm unfamiliar with Maven and this framework. At the core, I want the following features:

Shiny, Quarto, etc. follows these features.

jhelvy commented 1 year ago

I am now leaning more towards a hybrid version of my second option. The example .Rmd file here shows one possible approach. In this layout, the user defines the entire survey in a single .Rmd file. This helps with readability and makes the overall survey writing process much more literative.

I think it would be nice if the questions could be defined in a couple of different ways. In that example, I still allow users to define questions in a separate questions.yml file, and users can insert a question using a short code, e.g. {{ question age }}. But I would also like the ability to define questions in-line with the text too. This may be desirable if some of the question text depends on other responses, e.g. "Since you chose ___ on the last question, which of the following...". Then the user could use R code to paste together responses from prior inputs into other questions, etc. I still don't know how to do this part, but the idea is there. Use questions.yml for simple, static questions, and use code chunks or some other approach to define more complex questions where running R code is important to how the question renders on the page.

If this whole thing gets built in shiny, then there should be a way to make these values reactive, meaning we can change the value of what is shown based on a previously-chosen response. I'm thinking this might be the right way to go in terms of the overall user interface to designing the survey.

jhelvy commented 11 months ago

Just dropping in here the questions python package: https://github.com/cguardia/questions

Looks super close to what I had in mind. The fact that it's in python is also promising as this could probably be used to render code around the forms.

jhelvy commented 3 weeks ago

We've moved pretty hard in the direction of using shiny with quarto, and it's looking promising, so I'm going to close this issue.

jdimeo commented 3 weeks ago

Unfortunately no progress from our legal department yet on open sourcing the above framework. Hopefully soon! Sounds like a different but equally great direction you're taking.