Closed jdblischak closed 3 years ago
Hi John,
Sure, this sounds like a very reasonable idea. I mostly worked with UKBB sumstats, so I'm also not sure how often this might happen in practice. I'd rather keep the default behavior as it currently is to stay on the safe side. However, it might make sense to emit a warning in case we observed a "flipped" indel (with a suggestion to invoke the new flag), so that the users are aware of the potential loss of information.
I'm pretty constrained for time at the moment. However, if you can introduce a pull request with this flag, I'll be happy to accept it. Otherwise I'll try to get around to it in one of the next few weekends.
I'd rather keep the default behavior as it currently is to stay on the safe side.
Agreed
However, it might make sense to emit a warning in case we observed a "flipped" indel (with a suggestion to invoke the new flag), so that the users are aware of the potential loss of information.
That's a good idea. I'll include some warnings
I'm pretty constrained for time at the moment. However, if you can introduce a pull request with this flag, I'll be happy to accept it. Otherwise I'll try to get around to it in one of the next few weekends.
I've already started working on it. I'll send the PR when it's ready.
Thanks!
Hi John,
We might have to have both the flags available to be set -allow-swapped-indel-alleles and --allow-missing, assuming we can have indels flipped and few snps that are genuinely missing in the summary stats.
We might have to have both the flags available to be set -allow-swapped-indel-alleles and --allow-missing, assuming we can have indels flipped and few snps that are genuinely missing in the summary stats.
You can set both flags at the same time. Any variant that is truly missing from the LD matrix (be it a SNP or an indel) will still be removed prior to fine-mapping. The flag --allow-swapped-indel-alleles
only saves those indels where the chromosome and position are identical and the alleles are swapped.
This is a follow up to Issue #41 from @jerome-f.
When the allele order for an indel at the same coordinate differs between the sumstats file and the reference genotypes, there are 2 possible reasons:
A1: C A2: TT
vsA1: TT A2: C
)The current version of
set_snpid_index()
assumes the first scenario, and treats the swapped alleles as if they were completely different variants. This prevents any mistakes arising from misinterpreting insertions and deletions.However, it has other consequences for scenario 2. These indels are removed prior to fine-mapping, thus removing potential causal variants:
https://github.com/omerwe/polyfun/blob/b4655d0cdea44da39bbc60e664b2146228b241ad/finemapper.py#L270-L273
Also, it always invalidates the cached LD matrix file, and thus the LD matrix is always re-calculated:
https://github.com/omerwe/polyfun/blob/b4655d0cdea44da39bbc60e664b2146228b241ad/finemapper.py#L352-L354
I don't have a good sense for how often there are polymorphic parallel insertions/deletions at the same base pair coordinate. It seems to me like it would be a rare event, but I don't have any data to back up my intuition. And I understand why you would want to be cautious when combining sumstats with a reference panel such as the UKBB. However, in the case of using an insample LD matrix, this seems to only have downsides. If you are fine-mapping with the exact same genotypes you used for the original GWAS, it seems safe to assume that the 2 alleles were simply re-ordered.
Would you be open to adding a flag to
finemapper.py
to toggle this behavior? The default behavior would remain the same, but users could specify a flag such as--flip-indel-alleles
to prevent removing these indels when fine-mapping with an insample LD matrix. I'm happy to implement everything, but I wanted to get your approval first.