laderast / xvhelper

An R package to help decode Apollo datasets
https://laderast.github.io/xvhelper/
MIT License
3 stars 0 forks source link

Issues with variables with multiple values #4

Open up2thetop opened 7 months ago

up2thetop commented 7 months ago

Hi, I am using the package to decode UKB variables. I am running into some issues when there are multiple values for a given variable. For instance, for Field ID 6177 (Medication for cholesterol, blood pressure or diabetes), an individual can have a combination of values (e.g., Blood pressure medication | Insulin). When I apply code below, it returns NAs for the second part of the values (Blood pressure medication | Insulin -> Blood pressure medication | NA).

df2 <- df |> xvhelper::decode_single(coded_col_df) |> xvhelper::decode_multi_purrr(coded_col_df) > xvhelper::decode_multi(coded_col_df)

I have also tried using decode_df() but it returned same results with NAs for the all values except the 1st on in the list. Could you please provide help with this?

Thanks!

laderast commented 6 months ago

Ah, I'll do my best. No longer at DNAnexus / UKBiobank (I no longer have any access to UKB except for Showcase). Can you show me an example (no PHI) of this in the data? I'll need to see the values and a few lines of the field.

I'm looking at UKBiobank Showcase: https://biobank.ndph.ox.ac.uk/showcase/field.cgi?id=6177

up2thetop commented 6 months ago

Thanks for offering to help. Now I am trying it out with health outcomes data for Field IDs 41202 and I am experiencing similar issues. For this variable, all values are converted to NAs.

For instance, the data (saved as df) originally looks something like this, where I replaced the actual participant.eid with the row number:

participant.eid participant.p41202 1 1 ['K529', 'N320'] 2 2 ['C443', 'I251', 'K449', 'N390', 'S011'] 3 3 ['N608']

Then I run the code below. df |> xvhelper::decode_single(coded_col_df) |> xvhelper::decode_multi_purrr(coded_col_df)

The decoded data looks like:

participant.eid participant.p41202

1 1 NA|NA

2 2 NA|NA|NA|NA|NA

3 3 NA

Please let me know if you need additional information that might be helpful.

Thanks.

From: Ted Laderas @.> Sent: Monday, April 15, 2024 2:08 PM To: laderast/xvhelper @.> Cc: Yun Soo Hong @.>; Author @.> Subject: Re: [laderast/xvhelper] Issues with variables with multiple values (Issue #4)

  External Email - Use Caution

Ah, I'll do my best. No longer at DNAnexus / UKBiobank. Can you show me an example (no PHI) of this in the data?

- Reply to this email directly, view it on GitHubhttps://github.com/laderast/xvhelper/issues/4#issuecomment-2057516879, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHVPBLUKSSUNZFPYK7GCHZLY5QJQZAVCNFSM6AAAAABGBPLXG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJXGUYTMOBXHE. You are receiving this because you authored the thread.Message ID: @.***>

laderast commented 6 months ago

@up2thetop - thanks for the example. I think the issue is that they are returning single quotes (they returned double quotes before). I've posted a fix - can you reinstall and try it out?