xinhe-lab / ctwas

package for the causal TWAS project
https://xinhe-lab.github.io/ctwas/
MIT License
32 stars 12 forks source link

Fine mapping with parameter estimation #19

Open jgockley62 opened 4 months ago

jgockley62 commented 4 months ago

I'm trying to estimate parameters for finemapping and for some reason I the first iteration estimates the gene prior to be zero:

 ctwas::ctwas_rss(
          z_gene = z_gene_subset,
          z_snp = z_snp_subset,
          ld_exprvarfs = ordered_ld_exprvarfs,
          ld_R_dir = ld_R_dir,
          ld_regions_custom = regions_file,
          outputdir = outputdir,
          outname = outname,
          estimate_group_prior = T,
          estimate_group_prior_var = T)
   ...
2024-04-18 13:22:30.132874 INFO::Adding R matrix info for chrom 20
2024-04-18 13:22:30.141519 INFO::Adding R matrix info for chrom 21
2024-04-18 13:22:30.14914 INFO::Adding R matrix info for chrom 22
2024-04-18 13:22:30.177372 INFO::Run susie iteratively, getting rough estimate ...
2024-04-18 13:22:30.179785 INFO::run iteration 1
starting worker pid=3144 on localhost:11502 at 13:22:31.111
Loading required package: ctwas
loaded ctwas and set parent environment
2024-04-18 13:22:57.624383 INFO::After iteration 1, gene prior 0:, SNP prior:0.000186636804777902
2024-04-18 13:22:57.658267 INFO::run iteration 2
starting worker pid=4189 on localhost:11502 at 13:22:58.565

This seems to result in an error in the second iteration specifically here as prior_variance is never assigned a variable. Unclear on what I may have done incorrectly or if this is a potential bug, but I'd greatly appreciate any thoughts on what might be going on!

sq-96 commented 4 months ago

@jgockley62 Hi, did you use the example data (one region) to estimate parameters? This will cause an error. We suggest using genome-wide data (all regions) to estimate parameters.

jgockley62 commented 4 months ago

Ahhhh ok I will try estimating genome-wide. Thanks!

HackerLZH commented 4 months ago
2024-04-22 22:14:52.18408 INFO::Adding R matrix info, as genotype is not given
2024-04-22 22:14:52.537894 INFO::Adding R matrix info for chrom 1
2024-04-22 22:27:55.469278 INFO::Adding R matrix info for chrom 2
Error in (function (cond)  : 
  error in evaluating the argument 'y' in selecting a method for function 'crossprod': subscript out of bounds
Calls: system.time ... sapply -> lapply -> FUN -> crossprod -> <Anonymous>
In addition: Warning messages:
1: In asMethod(object) :
  sparse->dense coercion: allocating vector of size 1.1 GiB
2: In asMethod(object) :
  sparse->dense coercion: allocating vector of size 1.1 GiB

@sq-96 Hello, what's wrong with me? I'm estimating parameters with genome-wide data after having imputed gene z-scores. And I use ld downloaded from your site. And I use my own gwas summary statistics. Please.

sq-96 commented 4 months ago

@HackerLZH Could you attach your code here?

HackerLZH commented 4 months ago

@sq-96 It's in fact R format, but can't be uploaded, so I reformated. run_ctwas.txt Thanks, please.

jgockley62 commented 4 months ago

Hi @sq-96 Apologies to un fork this back. I've switched to the multigroup fork for parallel processing, it seems to run into resource issues randomly. For example when running impute_expr_z() it will throw an error on say chromosome 3, but when I re-run it, it will run fine. I'm not sure if its just an issue handling the cluster build every chromosome or not, but wondering if you had any advice. Should I switch to the multigroup_dev branch?

Lost warning messages Error: no more error handlers available (recursive errors?); invoking 'abort' restart Lost warning messages Error: no more error handlers available (recursive errors?); invoking 'abort' restart Error in[.data.frame(z_snp, z.idx, ) : INTEGER() can only be applied to a 'integer', not a 'unknown type #29' In addition: Warning message: In[.data.frame(z_snp, z.idx, ) : type 29 is unimplemented in 'type2char'

fengx25 commented 4 months ago

@jgockley62 Hi, did you use the example data (one region) to estimate parameters? This will cause an error. We suggest using genome-wide data (all regions) to estimate parameters.

Hi, I have a similar problem. When performing cTWAS at a single locus, it's necessary to provide the estimated prior inclusion probabilities for genes and variants (group_prior) as well as the estimated effect sizes for genes and variants (group_prior_var). Is it possible to estimate these two parameters in the single locus model? This approach is preferred because we specifically aim to conduct fine-mapping analysis within a region around each top SNP. Performing cTWAS genome-wide for parameter estimates consumes considerable resources and time.

sq-96 commented 4 months ago

@HackerLZH @jgockley62 @fengx25 Hi All, we are going to release a new version very soon, with improved speed and memory usage. I believe it would solve problems you posted. I will notify you once it is ready. Thanks for your patience!