Closed arborzhang closed 11 months ago
Yea, I was debugging the issue last night, and I think its an issue me and Daniel encountered earlier where if the variable has like special characters (comma, slash, etc), it affects how it works on the command line (not the R Console). I will have to find some ways to escape/make it work, but might be some time. In the meantime, you can work on the protocol and clarifying the variables you are interested in. I still think you have waaayyy too many variables that I guarantee you won't end up using, but we can see how it goes :stuck_out_tongue:
I deleted some variables and tried the process at UKB RAP again, but still got errors.
I am afraid there are several gaps I might misunderstand and it will be appreciated to be clarified. Please see my questions below.
# Keep only the necessary variables for RAP -------------------------------
# the necessary variables are kept in the `data-raw/project-variables.csv`
library(magrittr)
# After the variables have been properly selected in the `data-raw/project-variables.csv`
# file, run this function so that only the selected variables are kept in the
# `data-raw/rap-variables.csv` file. This file has the exact variable names used
# by RAP that we need in order to create the project-specific dataset. After
# running this function, review the changes in Git and add and commit the changed
# files into the history.
# Uncomment if you messed up and need to start over.
#ukbAid::project_variables %>%
# readr::write_csv(here::here("data-raw/project-variables.csv"))
# Update if necessary.
ukbAid::rap_variables %>%
readr::write_csv(here::here("data-raw/rap-variables.csv"))
ukbAid::subset_rap_variables(instances = 0:9)
I updated your comments.
magrittr
. It is a package to make use of the pipe %>%
. The rap-variables.csv
file gives the names of the variables needed for extracting from the RAP. They are slightly different from the ones in the project-variables.csv
, for instance, they have _i1
or _i2
at the end, which indicates the collection visit. Basically, you edit project-variables.csv
, and use the ukbAid::subset_rap_variables(instances = 0:9)
to update the RAP variables from the project variables list.ukbAid::subset_rap_variables(instances = 0:9)
, this is what takes the variables you select and delete in project-variables.csv
and update them so that RAP knows which variables to select from their own database.project-variables.csv
, if you select one variable (one row), then in the rap-variables.csv
, there will be more likely multiple rows for that one variable for each instance, since that variable actually has up to 9 other variables for each timepoint (e.g. p21353_i0
, p21353_i1
, p21353_i2
, etc).It totally makes sense. Thank you so much, this helps me a better understanding of different steps 👍
**But still got below error, I also tried taking out age at death, then the error became unrecognized arguments: age
dx: error: unrecognized arguments: age at death Error in system(table_exporter_command, intern = TRUE) : error in running command
I took out more variables, but still got below error. It seems related to "(" in variables, but also folder destination setup?
readr::read_csv(here::here("data-raw/rap-variables.csv")) %>%
- dplyr::pull(rap_variable_name) %>%
- ukbAid::create_csv_from_database() Rows: 1028 Columns: 3
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────── Delimiter: ","
chr (3): field_id, rap_variable_name, id
ℹ Use spec()
to retrieve the full column specification for this data.
ℹ Specify the column types or set show_col_types = FALSE
to quiet this message.
ℹ Started extracting the variables and converting to CSV.
! This function runs for quite a while, at least 5 minutes or more. Please be patient to let it finish.
sh: 1: Syntax error: "(" unexpected
dxpy.exceptions.ResourceNotFound: The specified folder could not be found in project-G9zB9B8JqFgQ6Pjx5kPGzzvQ, code 404. Request Time=1697813300.5873065, Request ID=1697813300794-198333
The destination folder does not exist
✔ Finished saving to CSV. Check "/mnt/project/users/jiezhang" or the project folder on the RAP to see that it was created.
[1] NA NA
Warning message: In system(table_exporter_command, intern = TRUE) : running command 'dx run app-table-exporter --brief --wait -y -idataset_or_cohort_or_dashboard=record-GXZ2k40JbxZx7xYGF66y45Yq -ifield_names='Sex' -ifield_names='Date of birth' -ifield_names='Year of birth' -ifield_names='Waist circumference | Instance 0' -ifield_names='Hip circumference | Instance 0' -ifield_names='Standing height | Instance 0' -ifield_names='Month of birth' -ifield_names='UK Biobank assessment centre | Instance 0' -ifield_names='Non-cancer illness year/age first occurred | Instance 0' -ifield_names='Pulse rate (during blood-pressure measurement) | Instance 0' -ifield_names='Birth weight known | Instance 0' -ifield_names='Job code at visit - entered | Instance 0' -ifield_names='Number of self-reported non-cancer illnesses | Instance 0' -ifield_names='Number of treatments/medications taken | Instance 0' -ifield_names='Townsend deprivation index at recruitment' -ifield_names='Reason lost to follow-up' -ifield_names='Date lost to follow-up' -ifield_names='Date of consenti [... truncated]
Yea, i also know that '
or "
can also cause some problems.... I think it will require some coding on my end to fix the problem they have on their end :angry:
Damn, I have no idea what is going on here... I'll keep digging, hopefully I'll have a solution by Monday :grimacing:
Thank you so much, I really appreciate your kind help. I also tried only two variables (age and weight), but still got an error. Best, Jie
On Sat, Nov 4, 2023 at 10:14 PM Luke W Johnston @.***> wrote:
Damn, I have no idea what is going on here... I'll keep digging, hopefully I'll have a solution by Monday 😬
— Reply to this email directly, view it on GitHub https://github.com/steno-aarhus/kimo/issues/1#issuecomment-1793556405, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHYM22YP4BOCPSE62MXGXBLYC2V2VAVCNFSM6AAAAAA6FMGVOGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGU2TMNBQGU . You are receiving this because you authored the thread.Message ID: @.***>
I think I fixed it, but not sure. Could you test it out with your fuller list of variables?
I only tried with two variables: age and weight. This is the error I got:
ukbAid::subset_rap_variables(instances = 0:9) ℹ Updated the "data-raw/rap-variables.csv" based on the selected project variables.
A tibble: 5 × 3
field_id rap_variable_name id
1 p31 Sex p31 2 p21002_i0 Weight | Instance 0 p21002 3 p21002_i1 Weight | Instance 1 p21002 4 p21002_i2 Weight | Instance 2 p21002 5 p21002_i3 Weight | Instance 3 p21002 readr::read_csv(here::here("data-raw/rap-variables.csv")) %>% + dplyr::pull(rap_variable_name) %>% + ukbAid::create_csv_from_database() Rows: 5 Columns: 3 ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────── Delimiter: "," chr (3): field_id, rap_variable_name, id
ℹ Use spec()
to retrieve the full column specification for this data.
ℹ Specify the column types or set show_col_types = FALSE
to quiet this message.
ℹ Started extracting the variables and converting to CSV.
! This function runs for quite a while, at least 5 minutes or more. Please be patient to let it finish.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/dxpy/scripts/dx.py", line 2858, in run_one
dxexecution.wait_on_done()
File "/usr/local/lib/python3.8/dist-packages/dxpy/bindings/dxjob.py", line 283, in wait_on_done
raise DXJobFailureError(err_msg)
dxpy.exceptions.DXJobFailureError: Job has failed because of AppError: Invalid characters found in field names at position(s) 2, 3, 4, 5 of the input.
dxpy.utils.resolver.ResolutionError: Unable to resolve "data-jiezhang-kimo.csv" to a data object or folder name in '/'
✔ Finished saving to CSV. Check "/mnt/project/users/jiezhang" or the project folder on the RAP to see that it was created.
[1] "job-Gb3yFV0JqFgbzfbkx3PgZVKB" NA
Warning message:
In system(table_exporter_command, intern = TRUE) :
running command 'dx run app-table-exporter --brief --wait -y -idataset_or_cohort_or_dashboard=record-GXZ2k40JbxZx7xYGF66y45Yq -ifield_names="Sex" -ifield_names="Weight | Instance 0" -ifield_names="Weight | Instance 1" -ifield_names="Weight | Instance 2" -ifield_names="Weight | Instance 3" -ioutput=data-jiezhang-kimo' had status 1
We did some updates today and tried to fix some issues. We got it to run properly but now dealing with an issue that the UKBiobank variables are different from the ones in RAP (e.g. the Townsend index, which has two variables, one of which p189
is restricted and we can't access it, so it gives an error). We'll try to find a programmatic way to deal with this, but in the mean time, you have have to manually look through the UKBiobank documentation and find if the variable is restricted or not.
Thanks for the update. I tried again and could not even download the kimo project at the first step (AFTER open UKB RAP and install the ukbaid package). Are there any changes of the process I should be aware?
── Downloading your GitHub project ─────────────────────────────────────────────── ℹ Lastly, we need to download your project. Please answer this question. ℹ Defaulting to 'https' Git protocol Error in `gh::gh()`: ! GitHub API error (404): Not Found ✖ URL not found:I think this has been fixed :star_struck:
I opened my project (kimo) at UKB RAP, and followed the instructions, but I got below errors: https://steno-aarhus.github.io/ukbAid/using-rap.html and https://github.com/steno-aarhus/kimo/blob/main/data-raw/create-data.R
readr::read_csv(here::here("data-raw/rap-variables.csv")) %>% dplyr::pull(rap_variable_name) %>% ukbAid::create_csv_from_database()
Error in system(table_exporter_command, intern = TRUE) : cannot popen 'dx run app-table-exporter --brief --wait -y -idataset_or_cohort_or_dashboard=record-GXZ2k40JbxZx7xYGF66y45Yq -ifield_names='Verbal interview duration | Instance 0' -ifield_names='Verbal interview duration | Instance 1' -ifield_names='Verbal interview duration | Instance 2' -ifield_names='Verbal interview duration | Instance 3' -ifield_names='Biometrics duration | Instance 0' -ifield_names='Biometrics duration | Instance 1' -ifield_names='Biometrics duration | Instance 2' -ifield_names='Biometrics duration | Instance 3' -ifield_names='Sample collection duration | Instance 0' -ifield_names='Sample collection duration | Instance 1' -ifield_names='Sample collection duration | Instance 2' -ifield_names='Sample collection duration | Instance 3' -ifield_names='Conclusion duration | Instance 0' -ifield_names='Conclusion duration | Instance 1' -ifield_names='Conclusion duration | Instance 2' -ifield_names='Conclusion duration | Instance 3' -ifield_names='Heel ultrasound m