ikmb / eagleimp

GNU General Public License v3.0
21 stars 3 forks source link

WARNING: Analysis skipped: Too few reference data for this region. (Mrefreg = 0) #1

Closed vnsklight27 closed 2 years ago

vnsklight27 commented 2 years ago

Hi Team, Great work! I am trying to get hands-on using your novel software for imputation. I am trying to run the commands on a choice of genetic map, ref_vcf and sample_vcf file. I could successfully create qref file of the ref file. The genetic map contains the following information:

chr position COMBINED_rate(cM/Mb) Genetic_Map(cM) chr21 10326676 . 0.584144 chr21 12968320 . 0.584144 chr21 12970435 . 0.585474 chr21 12977762 . 0.589953 ...

The following error was generated: WARNING: Analysis skipped: Too few reference data for this region. (Mrefreg = 0).

I am kind of confused, because I could successfully impute the same set of files using Beaglev5.2. Can you please provide a few pointers here on this warning?

Thank you so much, ksn

lwienbrandt commented 2 years ago

Hi ksn, thanks for reporting this and sorry for the late answer. This is indeed confusing. Could you please confirm that the reference file and the target file exclusively contain the same chromosome (in your case apparently chr21). If yes, can you place the contents of the log file? (You could also mail it to me, if you like.) Best, Lars

lwienbrandt commented 2 years ago

Hi again, so I looked into this again and the reason for such an error is indeed that your reference file does not cover the region you are going to impute sufficiently. This could have one of the following reasons:

  1. Your reference file contains more than one chromosome, and the one you are going to analyse is not the first in that file. Background: EagleImp only reads the first chromosome from a VCF file (and a Qref file contains only one chromosome anyway). I think, this is the most probable reason.
  2. A less probable reason might be that the region in your target data is indeed not well covered in the provided reference. and the analysis requires chunking, such that a chunk of target data is not covered by the reference at all. In this case, I bet that your analysis with Beagle does not deliver good results for this region as well. A solution could be to either find a proper reference panel (I know, these are rare...) or to exclude this region beforehand.
  3. Another reason might be that a lot of your target variants were filtered for whatever reason, probably that a corresponding variant is not found in your provided reference. You could see this in the .varinfo file created from an EagleImp run. Please have a look at the --allowRefAltSwap and --allowStrandFlip switches then.

In no case it has to do with the provided genetic map as it is required only to calculate recombination probabilities during the phasing step. So, I'm closing this issue now, but please let me know if you still encounter this problem or other problems as well. Thanks, Lars