No $fam created with snp_readBGEN(). How to map genotype back to individuals?

privefl / bigsnpr

R package for the analysis of massive SNP arrays.

https://privefl.github.io/bigsnpr/

183 stars 43 forks source link

No $fam created with snp_readBGEN(). How to map genotype back to individuals? #489

Closed el-rs closed 2 months ago

el-rs commented 3 months ago

Hi, I'm using the snp_readBGEN() function to read in UKB imputed data. This generates the $genotype and $map objects, but not $fam. How would one map the genotypes back to individual sample EIDs? Also am I right in assuming the SNPs in genotype matrix are ordered the same as input list? Thanks!

privefl commented 3 months ago

Also am I right in assuming the SNPs in genotype matrix are ordered the same as input list?

Yes

How would one map the genotypes back to individual sample EIDs?

Use match() with the EIDs from the both the CSV file and the sample file.

The only thing I could store in the $fam is what is provided to me: the indices (not the EIDs) of the samples read from the BGEN/sample. Would that help? I am not sure.

Usually what I do is that I directly fill the $fam after reading, and save the expanded object. Cf. e.g. https://github.com/privefl/paper-infer/blob/main/code/prepare-geno-simu.R#L49-L52

el-rs commented 3 months ago

This helps! To clarify, when using the sample indices filter, the $genotype object would contain samples in the same order as given in ind_row argument? Since there is no specific id variable by which to merge genotype with samples.

Also, what could be the reason $fam is not getting generated?

The dosages obtained in $genotype are for allele2?

Thanks again!

privefl commented 3 months ago

Yes, same order as ind_row
The only thing I could store in the $fam is what is provided to me: the indices (not the EIDs) of the samples read from the BGEN/sample.
The allele used as reference may depend on the dataset; usually I do some GWAS of some known hits and compare to reported hits in the GWAS Catalog to check the sign

el-rs commented 3 months ago

Thanks for this! I'm still not sure how to get the correct allele the dosages are referring to. Is it always the reference allele or alternate? Could you please provide further guidance on this? Thank you, really appreciate your help!

privefl commented 3 months ago

I do not have a definitive answer. You should check using some reference, either being GWAS hits, or maybe simply allele frequencies.