suchestoncampbelllab / gwasurvivr

GWAS Survival Package in R
11 stars 12 forks source link

Can we use multiple IDs? #8

Closed Qiaolan closed 4 years ago

Qiaolan commented 4 years ago

Hi,

Since both FID and IID are not unique in my data, I wonder if I could use both of them instead of one. For example, in impute2CoxSurv(), can we use more than one id for the argument id.column

Thanks!

karaesmen commented 4 years ago

Hi!

gwasurvivr allows only one ID column with unique IDs. But in your case you can easily generate one by pasting or "uniting" the columns together.

For example for a toy data frame called df

df <- data.frame(
    FID = paste0("FID", 1:10), 
    IID = paste0("IID", 21:30),
    cov1 = rnorm(10),
    cov2 = sample(c(T, F), size = 10, T))
)

Which looks like the following

     FID   IID       cov1  cov2
1   FID1 IID21 -0.7152373 FALSE
2   FID2 IID22 -0.3054538 FALSE
3   FID3 IID23 -0.6130526 FALSE
4   FID4 IID24  0.5737988  TRUE
5   FID5 IID25  0.9626100  TRUE

You can paste the columns into one:

df$ids <- paste(df$FID, df$IID, sep="_")

Or you can use the unite function from tidyr package

library(tidyr)
unite(df, "ids", FID, IID, remove=FALSE)

So your new data frame will look like this

           ids   FID   IID       cov1  cov2
1   FID1_IID21  FID1 IID21 -0.7152373 FALSE
2   FID2_IID22  FID2 IID22 -0.3054538 FALSE
3   FID3_IID23  FID3 IID23 -0.6130526 FALSE
4   FID4_IID24  FID4 IID24  0.5737988  TRUE
5   FID5_IID25  FID5 IID25  0.9626100  TRUE

And provide this new id column called ids to the impute2CoxSurv function.

Qiaolan commented 4 years ago

Hi!

gwasurvivr allows only one ID column with unique IDs. But in your case you can easily generate one by pasting or "uniting" the columns together.

For example for a toy data frame called df

df <- data.frame(
    FID = paste0("FID", 1:10), 
    IID = paste0("IID", 21:30),
    cov1 = rnorm(10),
    cov2 = sample(c(T, F), size = 10, T))
)

Which looks like the following

     FID   IID       cov1  cov2
1   FID1 IID21 -0.7152373 FALSE
2   FID2 IID22 -0.3054538 FALSE
3   FID3 IID23 -0.6130526 FALSE
4   FID4 IID24  0.5737988  TRUE
5   FID5 IID25  0.9626100  TRUE

You can paste the columns into one:

df$ids <- paste(df$FID, df$IID, sep="_")

Or you can use the unite function from tidyr package

library(tidyr)
unite(df, "ids", FID, IID, remove=FALSE)

So your new data frame will look like this

           ids   FID   IID       cov1  cov2
1   FID1_IID21  FID1 IID21 -0.7152373 FALSE
2   FID2_IID22  FID2 IID22 -0.3054538 FALSE
3   FID3_IID23  FID3 IID23 -0.6130526 FALSE
4   FID4_IID24  FID4 IID24  0.5737988  TRUE
5   FID5_IID25  FID5 IID25  0.9626100  TRUE

And provide this new id column called ids to the impute2CoxSurv function.

Hi Karaesmen! Thanks for your reply!

As you see, now a new id appears in the covariate file. But how about the .sample file? How does this new id match the ids in the .sample? Thank you!

karaesmen commented 4 years ago

Yes you would have to change the .sample file for that as well. You can follow the same example above to assign that to IID for example and save a different copy than the original, say modified.sample and provide this modified file path to impute2CoxSurv function. Then use IID as your main id column for covariate table as well and provide argument id.column="IID".

But be careful with this manipulation, .sample file give the order of the samples in the impute2 file, so make sure the row order is not changed. Also don't forget that the typical .sample file must always have the first row with 0 0 0 D (can be different depending on the number of columns you might have in the .sample file, see the head of the original file first).