kenhanscombe / ukbtools

An R package to manipulate and explore UK Biobank data
https://kenhanscombe.github.io/ukbtools/
96 stars 26 forks source link

Error: Length of logical index vector for `[` must equal number of columns (or 1): #18

Closed moldach closed 4 years ago

moldach commented 5 years ago

I'm getting an error when trying to use ukb_context on a subgroup of interest.

my_ukb_data <- ukb_df("ukb24898", path = "/share/projects/uk_biobank/pheno_data")
my_ukb_key <- ukb_df_field("ukb24898", path = "/share/projects/uk_biobank/pheno_data")

One thing I noticed is that the ukb_df_field() command is appending uses_datacoding_... to all of the variables which seems a bit odd -not what I see from the vignette- but perhaps this is because there are multiple UDI's for each Description (e.g. Never eat eggs, dairy, wheat, sugar (pilot) Uses data-coding 100672 has four UDI's: 10855-0.0, 10855-0.1, 10855-0.2, 10855-0.3)?

The error I'm getting is from the ukb_context() function:

heavy_abuse_subgroup <- (my_ukb_data$physically_abused_by_family_as_a_childuses_datacoding_532_f20488_0_0 == "Very often true")
ukb_context(my_ukb_data, nonmiss.var = heavy_abuse_subgroup )

Error: Length of logical index vector for `[` must equal number of columns (or 1):
* `.data` has 3177 columns
* Index vector has length 502543

The phenodata we paid for (41975) apparently does not have body_mass_index or BMI so I cannot try what you have in the vignette. I can however provide you with the data dictionary if we need to troubleshoot using another variable.

sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /share/apps/anaconda2/lib/libopenblasp-r0.3.5.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] feather_0.3.3   ukbtools_0.11.2 usethis_1.4.0   devtools_2.0.2 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1         plyr_1.8.4         compiler_3.5.3     pillar_1.3.1       iterators_1.0.10   prettyunits_1.0.2 
 [7] remotes_2.0.4      tools_3.5.3        testthat_2.1.1     digest_0.6.18      packrat_0.5.0      pkgbuild_1.0.3    
[13] pkgload_1.0.2      memoise_1.1.0.9000 tibble_2.1.1       gtable_0.3.0       pkgconfig_2.0.2    rlang_0.3.4       
[19] foreach_1.4.4      cli_1.1.0          rstudioapi_0.10    parallel_3.5.3     xfun_0.6           knitr_1.22        
[25] stringr_1.4.0      withr_2.1.2        dplyr_0.8.0.1      hms_0.4.2          desc_1.2.0         fs_1.2.7          
[31] rprojroot_1.3-2    grid_3.5.3         tidyselect_0.2.5   data.table_1.12.2  glue_1.3.1         R6_2.4.0          
[37] processx_3.3.0     XML_3.98-1.19      sessioninfo_1.1.1  tidyr_0.8.3        readr_1.3.1        callr_3.2.0       
[43] purrr_0.3.2        ggplot2_3.1.1      magrittr_1.5       codetools_0.2-16   backports_1.1.4    scales_1.0.0      
[49] ps_1.3.0           assertthat_0.2.1   colorspace_1.4-1   stringi_1.4.3      doParallel_1.0.14  lazyeval_0.2.2    
[55] munsell_0.5.0      crayon_1.3.4  
kenhanscombe commented 5 years ago

Please install the dev version v0.11.2.9000:

devtools::install_github("kenhanscombe/ukbtools", dependencies = TRUE)

I included a regex to remove the "usesdatacoding..." from column names – UKB previously had a hyphen between "data" and "coding" 😒

Re. ukb_context, you might want to check that the defaults for the demographic variables pick up the relevant variables in your dataset, e.g.,

select(my_ukb_data, matches("^sex.*0_0"))

Re. bmi, if you have height and weight you can construct it yourself. Alternatively, you can usually just ask the UKB (access@ukbiobank.ac.uk) for any variables you forgot to add to your basket.

Let me know how you get on.

Ken

moldach commented 5 years ago

Hi Ken,

I downloaded the dev version:

> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /share/apps/anaconda2/lib/libopenblasp-r0.3.5.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] needs_0.0.3          dplyr_0.8.0.1        feather_0.3.3        ukbtools_0.11.2.9000 usethis_1.4.0        devtools_2.0.2      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1         plyr_1.8.4         compiler_3.5.3     pillar_1.3.1       iterators_1.0.10   prettyunits_1.0.2  remotes_2.0.4     
 [8] tools_3.5.3        testthat_2.1.1     digest_0.6.18      packrat_0.5.0      pkgbuild_1.0.3     pkgload_1.0.2      memoise_1.1.0.9000
[15] tibble_2.1.1       gtable_0.3.0       pkgconfig_2.0.2    rlang_0.3.4        foreach_1.4.4      cli_1.1.0          rstudioapi_0.10   
[22] parallel_3.5.3     stringr_1.4.0      withr_2.1.2        hms_0.4.2          desc_1.2.0         fs_1.2.7           rprojroot_1.3-2   
[29] grid_3.5.3         tidyselect_0.2.5   data.table_1.12.2  glue_1.3.1         R6_2.4.0           processx_3.3.0     XML_3.98-1.19     
[36] sessioninfo_1.1.1  tidyr_0.8.3        readr_1.3.1        callr_3.2.0        purrr_0.3.2        ggplot2_3.1.1      magrittr_1.5      
[43] codetools_0.2-16   backports_1.1.4    scales_1.0.0       ps_1.3.0           assertthat_0.2.1   colorspace_1.4-1   stringi_1.4.3     
[50] doParallel_1.0.14  lazyeval_0.2.2     munsell_0.5.0      crayon_1.3.4   

but I still have some uses_datacoding there: my_ukb_data$physically_abused_by_family_as_a_childuses_datacoding_532_f20488_0_0

I've checked to see if the defaults are getting picked up:

dplyr::select(my_ukb_data, matches("^sex.*0_0"))
dplyr::select(my_ukb_data, matches("^age_when_attended_assessment_centre.*0_0"))
dplyr::select(my_ukb_data, matches("^townsend_deprivation_index_at_recruitment.*0_0"))
dplyr::select(my_ukb_data, matches("^ethnic_background.*0_0"))
dplyr::select(my_ukb_data, matches("^uk_biobank_assessment_centre.*0_0"))

All of the above returned columns except for:

dplyr::select(my_ukb_data, matches("^current_employment_status.*0_0"))

Therefore, I tried setting that var as NULL.

ukb_context(my_ukb_data, nonmiss.var = heavy_abuse_subgroup, employment.var = NULL)

But I get the following error:

Error in `[.data.frame`(data, , nonmiss.var) : undefined columns selected
kenhanscombe commented 5 years ago

Unfortunately the way I've written ukb_context you need all the demographics (I might change this in the future if I find time). Try a broader search:

dplyr::select(my_ukb_data, matches("employment"))

If you don't have the current emloyment status variables, try request them from the UKB.

Re. the remaining variants of "uses data coding", I'll try to clean that up.

Let me know how you get on.

Ken