jennasimit / flashfm

Flashfm: multi-trait fine-mapping that uses GWAS summary statistics from several traits
https://jennasimit.github.io/flashfm/
3 stars 2 forks source link

Problems with makeSNPgroups2 function #3

Open HannahCdH opened 1 month ago

HannahCdH commented 1 month ago

Hi,

we have observed two problems with flashfm, using 4 traits with UK Biobank data, that we could not solve on our side.

Both stem from the makeSNPgroups2 function:

  1. There are some regions that produce the following error: "wh[i, 1] : indices out of bounds" This is especially troublesome because we observed one region where the results change between this error message and no error message at random.

We observed that if ng == 3 (meaning 3 snp groups) the error message does not occur. If it occurs: ng == 2. Additionally, in the case of 3 snp groups, the sizes are 52, 57, 2 whereas in the case of 2 snp groups the sizes are 109 and 2. It looks like in some cases the first snp group is split into two and sometimes it is not. We were not able to understand at which point this split is decided.

We have located the error itself to be stemming from the loop on line 94 inside the makeSNPgroups2 function.

  1. In other regions the following error message occurs: error in hclust(rd, method = "complete") : n >= 2 objects necessary for clustering We followed the error back to this part of the code: in the groupmulti function: h <- hclust(rd, method = "complete")

Which is called on in this part of the makeSNPgroups2 function: sg2 <- groupmulti(SM2list,Xmat,is.snpmat,min.mppi,minsnpmppi,r2.minmerge)

The error occurs only in the second call to groupmulti.

We would be grateful for your input and help as to why these errors occur and how we can fix them.

Thank you

Best wishes, Hannah

jennasimit commented 1 month ago

Hi Hannah,

Apologies for this inconvenience. I had updated the grouping functions to be more stable, and I suggest that you download the newest package: flashfmZero at https://github.com/jennasimit/flashfmZero, which was just uploaded today. I will need to add a note to flashfm on this.

flashfmZero has all of the functions from flashfm, more flexible versions from MGflashfm, and some new ones. It also has R2BGLiMS embedded, so you only need to call: library(flashfmZero) to use.

It sounds like you are using the wrapper FLASHFMwithJAM in flashfm, and I recommend that you switch to FLASHFMwithJAMd, which gives more control for the arguments to JAM. The default is jam.nM.iter = 1 and increasing this number will increase the number of iterations (in millions) for JAM - often 1M is enough, but if you are finding that the results are unstable, then I suggest that you use jam.nM.iter = 5.

In FLASHFMwithJAMd, the maximum number of causal variants is set to 1 by default (maxcv=1) and it is dynamically adjusted, learning from the data in a parsimonious way. This works well when the LD comes from an external reference panel, but if you are using the UKBB LD, then you could start with a higher initial value, as is done in FLASHFMwithJAM (e.g. maxcv=10 is used there).

I hope this helps, and please let me know if this solves the issues that came up in your analysis.

Best wishes, Jenn

On Mon, 12 Aug 2024 at 14:05, HannahCdH @.***> wrote:

Hi,

we have observed two problems with flashfm, using 4 traits with UK Biobank data, that we could not solve on our side.

Both stem from the makeSNPgroups2 function:

  1. There are some regions that produce the following error: "wh[i, 1] : indices out of bounds" This is especially troublesome because we observed one region where the results change between this error message and no error message at random.

We observed that if ng == 3 (meaning 3 snp groups) the error message does not occur. If it occurs: ng == 2. Additionally, in the case of 3 snp groups, the sizes are 52, 57, 2 whereas in the case of 2 snp groups the sizes are 109 and 2. It looks like in some cases the first snp group is split into two and sometimes it is not. We were not able to understand at which point this split is decided.

We have located the error itself to be stemming from the loop on line 94 inside the makeSNPgroups2 function.

  1. In other regions the following error message occurs: error in hclust(rd, method = "complete") : n >= 2 objects necessary for clustering We followed the error back to this part of the code: in the groupmulti function: h <- hclust(rd, method = "complete")

Which is called on in this part of the makeSNPgroups2 function: sg2 <- groupmulti(SM2list,Xmat,is.snpmat,min.mppi,minsnpmppi,r2.minmerge)

The error occurs only in the second call to groupmulti.

We would be grateful for your input and help as to why these errors occur and how we can fix them.

Thank you

Best wishes, Hannah

— Reply to this email directly, view it on GitHub https://github.com/jennasimit/flashfm/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGLZAC2IN4PV5SESKEH5TETZRCXJDAVCNFSM6AAAAABMMEPBFKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ3DAOJVGM4DAMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

HannahCdH commented 3 weeks ago

Hi Jenn,

thank you very much for your quick response. I tried using the newest version from flashfm in flashfmZero. Sadly the error with the indexing still occurs ("wh[i, 1] : indices out of bounds") in the makeSNPgroups2 function.

I have a signal with around 3200 SNPs, I use four traits, test for at least an association of 10e-6 for each trait, and only traits that fulfill this criterium are used for flashfm analysis. I am using results from Wakefield fine-mapping as single-trait fine-mapping results as input.

The number of signals in which this error occurs are reduced compared to before. I do not actually need the groups, I only need them to use the functions PPsummarise and allcredsets.

Please let me know if you have any new ideas, as to how to fix this error.

Thank you.

Best regards, Hannah

jennasimit commented 3 weeks ago

Hi Hannah,

Thank you for this extra information. One issue with using Wakefield fine-mapping is that it is limited to the assumption of only one causal variant, so flashfm will only give results with one causal variant with this input. If you use one of the wrappers - FLASHFMwithFINEMAP or FLASHFMwithJAMd - then you will be able to identify multiple causal variants for each trait.

I am curious about the error you are getting though, since that shouldn't be happening in flashfmZero. Are you working from a new workspace with only flashfmZero loaded, and not flashfm? If both libraries are loaded, it is possible that you are still calling the original flashfm code. To be sure that you are calling the flashfmZero code, add flashfmZero:: before your function call.

Since you are not interested in the grouping feature, I have added an option to PPsummarise - setting snpGroups=NULL will allow you to run it without any snpGroup summaries, but you need to download the updated flashfmZero package. Then, use your output from PPsummarise in allcredsets (gives only snps in CS) or allcredsetsPP (gives snps in CS and their MPP).

Let me know how this works for you.

Best wishes, Jenn

On Tue, 20 Aug 2024 at 14:20, HannahCdH @.***> wrote:

Hi Jenn,

thank you very much for your quick response. I tried using the newest version from flashfm in flashfmZero. Sadly the error with the indexing still occurs ("wh[i, 1] : indices out of bounds") in the makeSNPgroups2 function.

I have a signal with around 3200 SNPs, I use four traits, test for at least an association of 10e-6 for each trait, and only traits that fulfill this criterium are used for flashfm analysis. I am using results from Wakefield fine-mapping as single-trait fine-mapping results as input.

The number of signals in which this error occurs are reduced compared to before. I do not actually need the groups, I only need them to use the functions PPsummarise and allcredsets.

Please let me know if you have any new ideas, as to how to fix this error.

Thank you.

Best regards, Hannah

— Reply to this email directly, view it on GitHub https://github.com/jennasimit/flashfm/issues/3#issuecomment-2298845512, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGLZAC7X6DHNZTBRSASRFTTZSM7ARAVCNFSM6AAAAABMMEPBFKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJYHA2DKNJRGI . You are receiving this because you commented.Message ID: @.***>