mattroumaya / surveymonkey

Access your SurveyMonkey data directly from R!
https://mattroumaya.github.io/surveymonkey/
Other
42 stars 10 forks source link

Duplicates and errors subsetting columns #116

Open cswingle opened 1 year ago

cswingle commented 1 year ago

The latest survey created by our team seems to have issues when trying to parse the survey object. The code looks like this:

survey_object <- fetch_survey_obj(svy_id)
survey_df <- parse_survey(
  survey_object,
  fix_duplicates = "error"
)

With fix_duplicates = "error" I get Error: There are duplicated rows in the responses. This is unexpected, I'm afraid. The only submissions at that point were two responses I created to test the survey from different computers and with different answers.

With fix_duplicates = "drop" I get this:

Error in `out[, col_names]`:
! Can't subset columns that don't exist.
✖ Columns `image`, `survey_id`, `collector_id`, `response_id`, `date_created`, etc. don't exist.
Run `rlang::last_error()` to see where the error occurred.
Warning messages:
1: In duplicate_drop(x) :
   There are 22 duplicate responses, duplicates are dropped in
       the results. Set fix_duplicates = 'keep' to retain them.
2: Outer names are only allowed for unnamed scalar atomic inputs

I get the same error with fix_duplicates = "keep" except the warning message comes from duplicate_keep(x).

The only think I can think of that might be different is that this survey has one question that has a series of images and the respondent chooses one of them.

If there are R objects I can send you or some sort of debugging I can go through, I'm happy to give it a try. I did try loading a bunch of the internal functions into my environment and tried working through parse_survey to see if I could see what was going wrong, but I couldn't make sense of exactly what each step was trying to do.

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 11 (bullseye)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] dbplyr_2.2.1            RPostgres_1.4.4         surveymonkey_0.1.0.9000
 [4] glue_1.6.2              lubridate_1.8.0         forcats_0.5.2          
 [7] stringr_1.4.1           dplyr_1.0.10            purrr_0.3.5            
[10] readr_2.1.3             tidyr_1.2.1             tibble_3.1.8           
[13] ggplot2_3.3.6           tidyverse_1.3.2        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9          pillar_1.8.1        compiler_4.2.2     
 [4] cellranger_1.1.0    tools_4.2.2         bit_4.0.4          
 [7] googledrive_2.0.0   jsonlite_1.8.4      lifecycle_1.0.3    
[10] gargle_1.2.1        gtable_0.3.1        pkgconfig_2.0.3    
[13] rlang_1.0.6         reprex_2.0.2        DBI_1.1.3          
[16] cli_3.4.1           haven_2.5.1         xml2_1.3.3         
[19] withr_2.5.0         httr_1.4.4          generics_0.1.3     
[22] vctrs_0.5.1         fs_1.5.2            hms_1.1.2          
[25] bit64_4.0.5         googlesheets4_1.0.1 grid_4.2.2         
[28] tidyselect_1.2.0    R6_2.5.1            fansi_1.0.3        
[31] readxl_1.4.1        blob_1.2.3          tzdb_0.3.0         
[34] modelr_0.1.9        magrittr_2.0.3      backports_1.4.1    
[37] scales_1.2.1        ellipsis_0.3.2      rvest_1.0.3        
[40] assertthat_0.2.1    colorspace_2.0-3    utf8_1.2.2         
[43] stringi_1.7.8       munsell_0.5.0       broom_1.0.1        
[46] crayon_1.5.2 
mattroumaya commented 1 year ago

Hey @cswingle, sorry for the late reply!

It is unfortunately difficult to troubleshoot these sorts of issues since survey design tends to vary quite a bit. Later today, I'll take a shot at creating a branch with an additional parameter for fix_duplicates, which will hopefully just skip over any duplicate response handling and return a parsed survey.

Another issue that might be harder to resolve is that I don't believe images are fully supported in the package right now -- I don't have a premium account anymore so it's even harder to test and add new features, but hopefully the approach above will be enough to resolve this.

mattroumaya commented 1 year ago

Possibly related to #104

cswingle commented 1 year ago

@mattroumaya, I could email you the JSON from survey/:id/details and surveys/:id/details API queries (and any other endpoint I have access to) if that would help diagnose the issue. The first two survey responses are dummy responses so I wouldn't be sharing anything real other than the structure of the survey and a couple responses.

mattroumaya commented 1 year ago

@cswingle I'm definitely happy to take a look! my email is matthewroumaya@gmail.com. I'm a bit busy this week but hoping to take a closer look tomorrow.

mattroumaya commented 1 year ago

@cswingle I have a pull request ready for you to test out - whenever you have the chance, you can do:

devtools::install_github('mattroumaya/surveymonkey@47c1505773521d941a414ded769ef141037ac94c')

survey_df <- 123456789 %>%
  fetch_survey_obj %>%
  parse_survey(fix_duplicates = 'none')

You might see a warning that's thrown in pivot_longer() within the parse_survey() function, but this will hopefully allow you to pull your data and then resolve it after the survey is parsed.

cswingle commented 1 year ago

Thanks! I tried the pull but got similar errors to what I was seeing before:

:> survey_df <- 123456789 %>% fetch_survey_obj %>% parse_survey(fix_duplicates = 'none')
You have 496 requests left today before you hit the limit
You have 495 requests left today before you hit the limit
New names:
• `s3_key` -> `s3_key...1`
• `s3_key` -> `s3_key...2`
• `s3_key` -> `s3_key...3`
• `s3_key` -> `s3_key...4`
• `url` -> `url...5`
• `url` -> `url...6`
• `url` -> `url...7`
• `url` -> `url...8`
• `alt_text` -> `alt_text...9`
• `alt_text` -> `alt_text...10`
• `alt_text` -> `alt_text...11`
• `alt_text` -> `alt_text...12`
• `s3_key` -> `s3_key...13`
• `s3_key` -> `s3_key...14`
• `s3_key` -> `s3_key...15`
• `s3_key` -> `s3_key...16`
• `url` -> `url...17`
• `url` -> `url...18`
• `url` -> `url...19`
• `url` -> `url...20`
• `alt_text` -> `alt_text...21`
• `alt_text` -> `alt_text...22`
• `alt_text` -> `alt_text...23`
• `alt_text` -> `alt_text...24`
• `s3_key` -> `s3_key...25`
• `s3_key` -> `s3_key...26`
• `url` -> `url...27`
• `url` -> `url...28`
• `alt_text` -> `alt_text...29`
• `alt_text` -> `alt_text...30`
• `s3_key` -> `s3_key...31`
• `s3_key` -> `s3_key...32`
• `url` -> `url...33`
• `url` -> `url...34`
• `alt_text` -> `alt_text...35`
• `alt_text` -> `alt_text...36`
• `s3_key` -> `s3_key...37`
• `s3_key` -> `s3_key...38`
• `s3_key` -> `s3_key...39`
• `url` -> `url...40`
• `url` -> `url...41`
• `url` -> `url...42`
• `alt_text` -> `alt_text...43`
• `alt_text` -> `alt_text...44`
• `alt_text` -> `alt_text...45`
• `s3_key` -> `s3_key...46`
• `s3_key` -> `s3_key...47`
• `url` -> `url...48`
• `url` -> `url...49`
• `alt_text` -> `alt_text...50`
• `alt_text` -> `alt_text...51`
Error in `out[, col_names]`:
! Can't subset columns that don't exist.
✖ Columns `image`, `survey_id`, `collector_id`, `response_id`, `date_created`, etc. don't exist.
Run `rlang::last_error()` to see where the error occurred.
Warning message:
Outer names are only allowed for unnamed scalar atomic inputs 

Here's the last_trace():

+> rlang::last_trace()
<error/vctrs_error_subscript_oob>
Error in `out[, col_names]`:
! Can't subset columns that don't exist.
✖ Columns `image`, `survey_id`, `collector_id`, `response_id`, `date_created`, etc. don't exist.
---
Backtrace:
     ▆
  1. ├─510188122 %>% fetch_survey_obj %>% ...
  2. ├─surveymonkey::parse_survey(., fix_duplicates = "none")
  3. │ ├─out[, col_names]
  4. │ └─tibble:::`[.tbl_df`(out, , col_names)
  5. │   └─tibble:::vectbl_as_col_location(...)
  6. │     ├─tibble:::subclass_col_index_errors(...)
  7. │     │ └─base::withCallingHandlers(...)
  8. │     └─vctrs::vec_as_location(j, n, names, call = call)
  9. └─vctrs (local) `<fn>`()
 10.   └─vctrs:::stop_subscript_oob(...)
 11.     └─vctrs:::stop_subscript(...)
 12.       └─rlang::abort(...)