kjgleason / Primo

Primo
16 stars 3 forks source link

Conditional Analysis - Error in data.frame PP_Grouped #4

Closed Noavdplas closed 3 years ago

Noavdplas commented 4 years ago

As part of a research internship I would like to use primo to investigate the genetic pleiotropy of human aggressive behavior, subcortical brain volumes and a set of NMR metabolomics summary statistics including amino acids and lipoprotein particle concentrations. Using the vignette on the primo github page I've attempted to apply primo to my dataset. However, when attempting to run conditional analyses I've come across an error when it creates the PP_grouped dataset, which suggests that one of the necessary input variables has 0 rows. So far I've been unable to determine what causes this error and how I might remedy this. Could you advise me in how to proceed?

kjgleason commented 4 years ago

Thanks for using the package, Noavdplas; your analysis sounds interesting. Could you paste the error message you are getting so that I can help you troubleshoot?

It could also be helpful to start narrowing in on where the issue is occurring. Could you run the following directly and confirm that they return proper results?

PP_group <- Primo::collapse_pp_num(Primo_obj$post_prob, req_idx=gwas_col) PP_group <- Primo::collapse_pp_num(Primo_obj$post_prob[gwas_idx,], req_idx=gwas_col)

Primo_obj should be what was returned from your run of Primo_tstat() or Primo_modT(); gwas_col should be whichever column number in your dataset holds the GWAS trait; gwas_idx should hold the row numbers of the GWAS SNPs you are investigating. You could set that manually and only use a subset of the GWAS SNPs.

Also run the following and make sure that an integer vector is being returned (matching the row indices of your GWAS SNPs):

gwas_idx <- which(subset(IDs, select=snp_col)[[1]] %in% gwas_snps)

IDs is you data.table/data.frame holding the identifiers; snp_col is that character string of the column that holds the SNP identifiers in that data.table/data.frame; gwas_snps is a character vector of the identifiers of your GWAS SNPs.

Noavdplas commented 4 years ago

Thank you! The exact error message is: conditional_results <- run_conditional_gwas(Primo_obj=primo_results, IDs=myID_gwas, gwas_snps=gwas_snps_LD1, pvals=pvals_m, LD_mat=LD_matrix, snp_info=snp_info, pp_thresh=0.8, LD_thresh=0.9, dist_thresh=5e3, pval_thresh=1e-2) Error in data.frame(PP_grouped, nQTL_final = nQTL_final, top_pattern_precond = orig_max, : arguments imply differing number of rows: 1918, 0 Calls: run_conditional_gwas -> data.frame

Running the PP_group code results in the desired output

kjgleason commented 4 years ago

The error message implies that PP_grouped has 1918 rows (is this the expected number based on the present combinations of GWAS SNP/molecular traits?), but one or more of the other data to be combined in that step has length zero (nQTL_final, orig_max, sp_vec or poss_assoc). Perhaps you could help me identify which one(s) are length zero so we can further troubleshoot? Could you manually run the code for run_conditional_gwas() from lines 241-293 in the conditional.R file (https://github.com/kjgleason/Primo/blob/master/R/conditional.R)? You'll need to manually assign the function arguments to your data identifiers (e.g. Primo_obj=primo_results). My apologies for the extra work, but I'm not reproducing the error when running with some of my data.

kjgleason commented 3 years ago

Closing due to more than one year passing since last response.