privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

Hint for creating proper bed file from ped for polygenic risc score #490

Closed nalalenny closed 3 months ago

nalalenny commented 3 months ago

Hi I want to use bigsnpr for polygenic risc scoring. My data is provided in ped format and I have to convert it to bed file first. At the moment, when using snp_attach() the contents of the $map and $fam lists are not in the expected format (see below).

$map
    chromosome            marker.ID genetic.dist physical.pos allele1 allele2
1            0 10:101405508-CCTTT-C            0            0       0       I
2            0     10:101723134-T-C            0            0       0       C
3            0     10:102356710-A-T            0            0       0       T

I assume, that the flags for the plink --make-bed might be the cause. I tried some combinations and ended up with the same result (e.g. plink --file input.ped --make-bed --real-ref-alleles --recode --out output.bed)

Do you have any advice for the proper flags to use or am I missing something else?

Thanks in advance.

privefl commented 3 months ago

The format of the $map component seems fine to me. What is the error you're getting?

nalalenny commented 3 months ago

Although the chromosome is always 0? Because I was wondering if the data kind of shifted all the way to the marker.ID colum. Further down the code I get the following error message: Error: 'infos.chr' should have only positive values.

privefl commented 3 months ago

It seems the columns are probably missing from the original data; these missing values are encoded as 0 here.

Column genetic.dist is basically always missing. But you can infer the other 4 columns from marker.ID using something like https://tidyr.tidyverse.org/reference/separate.html.

nalalenny commented 3 months ago

Thank you for your input! I used the separate()function and it was successful, since the output looks much better now:

> marker.ID_df3
    CHR    SNP ID Allele 1      Allele 2
1    10 101405508    CCTTT             C
2    10 101723134        T             C
3    10 102356710        A             T
4    10 104789475        T             G
5    10 106288224        G             C

Nevertheless, I have now a data frame with this data and obj.bigSNP seems not to be compatible to "merge" the sorted data frame with the obj.bigSNP. Do you have any recommendations or do I miss something? Thanks for your help!

privefl commented 3 months ago

Just assign these to your existing dataframe, with something like obj.bigSNP$map[c(1, 4:6)] <- marker.ID_df3, and then use snp_save(obj.bigSNP) to save the new object with the updated $map.

nalalenny commented 3 months ago

Great, that worked perfectly. Thanks a lot for your support!

privefl commented 3 months ago

You can close the issue if there is nothing else on this.