Open Edouard-Legoupil opened 1 year ago
Based on the discussion this AM - I have revised the logic in the interface -
Scoping the need now between 2 distinct functions
An function to optimize the generation of the n
surveys (aka wave
) based on a list of questions
A simplex function to optimize the association between number of data collection waves and cost, based on:
The goal of this ticked is to develop the back office function & logic behind the data collection step
The module script is here: https://github.com/unhcr-americas/surveyDesigner/blob/main/R/mod_collection.R
From the previous stage, we have used the filters based on indicator selection and language (language label used for the country) to subset a list of questions and potential answers (if select_one or select_multiple).
In the collection stage, we need to assess if the total questionnaire should be split in different parts, aka data collection waves using :
interview duration (_each questions within the form can be assessed with the interview duration function_),
questions groups (aka a module of questions grouped between 'begin_group' and 'end_group'), and
on indicator requirement (_aka, based on the mapping, multiple questions potentially spread over multiple modules together with linked questions required for indicator disaggregation_).
data collection mode as it impact the sequence of the questions (in CAPI, sensitive questions being more kept at the end, while it is the contrary for CATI)
an estimation of the response rate based on average interview duration. As the longer is a survey, the higher is the risk of dropout, the impact of the designing long survey be can be estimated by the cost of reaching out people whose information will not be recorded. (basically total cost per interview would be a function of response rate). See this publication - Optimizing Data Collection Interventions to Balance Cost and Quality in a Sequential Multimode Survey
Also we shall estimates an operational budget, based on costing input (aka enumeration capacity and total cost per interview) and various respondant sample size threshold (500, 1000, 5000).
This should be done by assessing the current decision input and simulating the results of other alternatives.
Client - Validation
Dev - Tech