rapidsurveys / odkr

Open Data Kit R API
http://rapidsurveys.io/odkr/
GNU General Public License v3.0
11 stars 4 forks source link

issue with `expandMultChoice()` function #46

Closed ernestguevarra closed 3 years ago

ernestguevarra commented 3 years ago

Issue sent via email on 8 January 2021 at 14:45

Hi Ernest,

I hope this email finds you well. I have a question about the expandMultChoice function.

I see that the commas make a problem with my data. Ex: Line 1 has answers 01, 04 Line 2 has answers 01, 04, 03, 02

The function will detect 6 different answers and so will create 6 columns 01, 02 03, 04 04,

More problematic, the 04 and 04, pop for a total of 3 times, I don't understand why

01, 02 03, 04 04, 1 1 0 0 1 0 2 1 1 1 1 1

Do you have any idea how to avoid this?

Many thanks, Tristan

ernestguevarra commented 3 years ago

I am not very clear what is being asked. I would require a reproducible example as well.

ernestguevarra commented 3 years ago

Reply sent via email on 9 January 2021 at 13:25 GMT

Hi Tristan,

Apologies for the delay in reply. I was very busy yesterday. I am looking at your issue now.

I cannot quite understand what your question is. Will it be possible for you to give me a reproducible example? You will usually have to apply the expandMultChoice() function onto a dataset that you have already pulled out of ODK. Are you able to share a subset of that data for me to test? I know data is sensitive. I can guarantee that I will delete the file from my system once I have seen what the issue is and once I have figured out a solution (if needed).

If you wanted to follow the progress of this issue, please track it at https://github.com/rapidsurveys/odkr/issues/46

Thanks.

Best,

Ernest

ernestguevarra commented 3 years ago

Reply from Tristan sent via email on 9 January 2021 at 17:34

This is a part of the database before any action is done with only 2 observations.

the 'data-Section_9_REVENUS.-INCSOURCES' is the multiple choice question I was talking about (sheet 1). When using the expandMultChoice function on it, '04' and '04,' are seen as different answers.

Thanks, Tristan

Recreated vector of responses from dataset provided

x <- c("01, 04", "01, 04, 03, 02")

Expected behaviour/result

When expandMultChoice() is used, expect this result:

01 02 03 04
1 0 0 1
1 1 1 1

However, the result is:

01, 02 03, 04 04,
1 0 0 1 0
1 1 1 1 1
ernestguevarra commented 3 years ago

Reply from Ernest via email on 9 January 2021 at 18:56

Hi,

Thanks for sharing this. I find the issue you are facing and was able to replicate it. I think you are implementing the function this way:

expandMultChoice(answers = c("01, 04", "01, 04, 03, 02"))

and this produces the following result:

01, 02 03, 04 04,
1 0 0 1 0
1 1 1 1 1

as you have indicated above.

What is happening here is that the function is using the space between the each of the set of answers as the separator by which to split the responses. This is the default behaviour. This is why in the result, the labels on top include the , for values 01, 03 and 04. Now, you will notice that for the first set of response, 04 is not followed by a , and that is not the same as the character value 04,. Hence the extra column for this character value.

This is expected behaviour from the function when it is implemented using default arguments (as shown above).

To get a more appropriate result, the choices parameter needs to be specified as follows:

expandMultChoice(answers = c("01, 04", "01, 04, 03, 02"), choices = c("01", "02", "03", "04"))

What you are basically saying to the function is to split the answers into the choices that you have specified. The function will then just look for those values within the responses (excluding the ,). The result of using this is:

01 02 03 04
1 0 0 1
1 1 1 1

which is the result that you are expecting.

I hope this helps.