kenhanscombe / ukbtools

An R package to manipulate and explore UK Biobank data
https://kenhanscombe.github.io/ukbtools/
96 stars 26 forks source link

ukb_centre : Error: cannot allocate vector of size 15.0 Gb #16

Closed tra6sdc closed 5 years ago

tra6sdc commented 5 years ago

Hello, I am using your ukb_centre function to give better descriptive names to the assessment centres. When I do this I get the following output.

> ukb_centre(my_ukb_data, centre.var = "uk_biobank_assessment_centre_f54_0_0") 
Error: cannot allocate vector of size 15.0 Gb 
> str(my_ukb_data$uk_biobank_assessment_centre_f54_0_0)  
chr [1:502536] "11017" "11007" "11011" "11009" "11011" "11021" "11016" "11018" "11010" "11016" ...
> str(ukbcentre)
'data.frame':   24 obs. of  2 variables:
 $ code  : int  11012 11021 11011 11008 11003 11024 11020 11005 11004 11018 ...
 $ centre: chr  "Barts" "Birmingham" "Bristol" "Bury" ...
 - attr(*, "spec")=
  .. cols(
  ..   code = col_integer(),
  ..   centre = col_character()
  .. )--
 

In the past I have found this is because there is a miss-match between the type of the variable that I am matching with (inner or outer joins?). Here I note that in my_ukb_data the assessment centre is a character string whilst in the ukbcentre it is an int. I thought that the good work you have done with ukb_df/ukb_context might of also dealt with this, but possibly not so? Thanks.  

kenhanscombe commented 5 years ago

The function ukb_centre should return a dataframe with a new variable, ukb_centre containing the named test centres (Leeds, Glasgow, etc.), as the second column of the dataframe.

Unfortunately, I can't reproduce your error. How big is your my_ukb_data?

ukb_context calls ukb_centre. Does the centre distribution plot get labelled correctly when using ukb_context?

Note. ukbcentre is a dataset I've included with the package, which is just a lookup table for "code" value and "centre" name. The function ukb_centre uses the ukbcentre lookup table. (Probably not the best naming convention ...)

tra6sdc commented 5 years ago

Yes Ken, ukb_context works ok with the assessment centre bar chart. I'll have another look tomorrow.


From: Ken Hanscombe notifications@github.com Sent: Wednesday, April 24, 2019 4:21:31 PM To: kenhanscombe/ukbtools Cc: Stephen Clark; Author Subject: Re: [kenhanscombe/ukbtools] ukb_centre : Error: cannot allocate vector of size 15.0 Gb (#16)

The function ukb_centre should return a dataframe with a new variable, ukb_centre containing the named test centres (Leeds, Glasgow, etc.), as the second column of the dataframe.

Unfortunately, I can't reproduce your error. How big is your my_ukb_data?

ukb_context calls ukb_centre. Does the centre distribution plot get labelled correctly when using ukb_context?

Note. ukbcentre is a dataset I've included with the package, which is just a lookup table for "code" value and "centre" name. The function ukb_centre uses the ukbcentre lookup table. (Probably not the best naming convention ...)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/kenhanscombe/ukbtools/issues/16#issuecomment-486288263, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIFIFYPW7IZ2CWCABQGP4RLPSB3HXANCNFSM4HIEN5BQ.

tra6sdc commented 5 years ago
my_ukb_data <- ukb_centre(my_ukb_data, centre.var = "uk_biobank_assessment_centre_f54_0_0")

Worked fine.