smgogarten / GWASTools

Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis
https://bioconductor.org/packages/GWASTools
16 stars 6 forks source link

Update snpIDs in output of assocRegression to reflect the rsIDs #17

Closed niyati1211 closed 6 months ago

niyati1211 commented 6 months ago

I am running assocRegression under GWASTools. My genotype data is in ped/map format.

I ran the following code to create the gds object: geno_gds <- snpgdsPED2GDS(ped.fn = "random35k_new.ped" , map.fn = "random35k_new.map", out.gdsfn = "/random35k_new.gds") gds_object <- GdsGenotypeReader("random35k_new.gds")

When I run assocRegression:

scanAnnot <- ScanAnnotationDataFrame(phenotype_col_input)
genotype_data <- GenotypeData(gds_object, scanAnnot=scanAnnot) 
chr <- getChromosome(genotype_data)
auto <- range(which(is.element(chr, 1:22)))

res <- assocRegression(genotype_data,
                         "pheno",
                         model.type = c("linear"),
                         gene.action = c("additive"),
                         covar = c("treatment_group", "age", "sex", "bmi", "PC1", "PC2", "PC3", "PC4", "PC5", "PC6", "PC7", "PC8", "PC9", "PC10"),
                         scan.exclude = NULL,
                         CI = 0.95,
                         effectAllele = c("minor"),
                         snpStart = auto[1], snpEnd = auto[2],
                         block.size = 3500)

In my output "res", the values under the snpID column are integers (1,2,3,4,5), which I believe are from my GDS object. The integers are hard to interpret and don't tell me anything about the location/ref/alt allele of the SNP. I am not sure why the snpIDs are converted to integers when my map file does include the rsIDs...

I tried using SnpAnnotationDataFrame but I get various errors one being that chromosome must be integer even though I specify it not to be character (chromosome <- getChromosome(gds_object, char=FALSE).

How should I go about running this analysis so that in my output file, the snpID column values are the rsIDs from my map file? Is there a way to update the snpID object within the GDS? Any help will be appreciated!

Thank you!

smgogarten commented 6 months ago

The way GWASTools is written, snpID and chromosome must always be integers. From the man page of GdsGenotypeReader:

The GDS file must contain the following variables: • 'snp.id': a unique integer vector of snp ids • 'snp.chromosome': integer chromosome codes

Default values for chromosome codes are 1-22=autosome, 23=X, 24=XY, 25=Y, 26=M. The defaults may be changed with the arguments ‘autosomeCode’, ‘XchromCode’, ‘XYchromCode’, ‘YchromCode’, and ‘MchromCode’.

Using getChromosome(gds_object, char=TRUE) will return a character vector with X instead of 23, etc. I suggest you create a data frame with integer snpID, character chromosome and rsID columns, and merge it with the results of assocRegression (see documentation for base R merge or dplyr left_join).