poruloh / Eagle

Haplotype phasing software
64 stars 15 forks source link

running eagle with hg38 aligned vcf #16

Closed jjfarrell closed 5 years ago

jjfarrell commented 6 years ago

I am trying to run the eagle 2.4 software with an hg38 aligned vcf. I am getting the error: ERROR: Physical and genetic distance ranges must be positive (see below).

I noticed that the genetic_map_hg38_withX.txt.gz uses contig names as 1-23 instead of chr1-chr22,chrX. which is what is found in the hg38 vcf. Could that be the underlying issue?

I tried changing the genetic map to include chr but that generates the following error: ERROR: Chromosome 22 is not in genetic map

Any suggestions on this?

Here are the command lines and output....

Using original genetics map.

eagle \
    --vcf ukbb.hg38/chr22.hg38.sorted.vcf.bgz \
    --geneticMapFile /share/pkg/eagle/2.4/install/data/tables/genetic_map_hg38_withX.txt.gz \
    --outPrefix ukbb.hg38/chr22.hg38.phased \
    --vcfOutFormat z \
    --numThreads 8 \
    --chrom chr22

Setting number of threads to 8

=== Reading genotype data ===

Reading genotypes for N = 486060 samples
Read M = 11855 variants
Filling in genetic map coordinates using reference file:
  /share/pkg/eagle/2.4/install/data/tables/genetic_map_hg38_withX.txt.gz
Physical distance range: -15268910 base pairs
Genetic distance range:  0 cM
Average # SNPs per cM:   -2147483648
ERROR: Physical and genetic distance ranges must be positive
       First SNP: chr=22 pos=15329305 cM=0
       Last SNP:  chr=22 pos=60395 cM=0

Using genetics map with chr1-chr22,chrX for contig names.....

eagle \
    --vcf ukbb.hg38/chr22.hg38.sorted.vcf.bgz \
    --geneticMapFile genetic_map_hg38_withX.txt.gz \
    --outPrefix ukbb.hg38/chr22.hg38.phased \
    --vcfOutFormat z \
    --numThreads 8 \
    --chrom chr22

Setting number of threads to 8

=== Reading genotype data ===

Reading genotypes for N = 486060 samples
Read M = 11855 variants
Filling in genetic map coordinates using reference file:
  genetic_map_hg38_withX.txt.gz
ERROR: Chromosome 22 is not in genetic map
jjfarrell commented 6 years ago

For the map interpolation, It looks like the function below expects the chr position to be either int,int or float,float. But for hg38, the chr would be a string.

double MapInterpolater::interp(int chr, int bp) const {
    if (chrBpToRateGen.empty()) return 0;
    map < pair <int, int>, pair <double, double> >::const_iterator ubIter =
      chrBpToRateGen.upper_bound(make_pair(chr, bp)); // map record > (chr, bp)
    if (ubIter == chrBpToRateGen.end()) {
      cerr << "ERROR: Chromosome " << chr << " is not in genetic map" << endl;
      exit(1);
    }
    int ubChr = ubIter->first.first;
    int ubBp = ubIter->first.second;
    double ubRate = ubIter->second.first;
    double ubGen = ubIter->second.second;

    if (chr == ubChr) return 0.01 * (ubGen + 1e-6 * ubRate * (bp-ubBp)); // interpolate interval
    else return 0.01 * (--ubIter)->second.second; // end of previous chromosome
  }
}
poruloh commented 6 years ago

Could you check the last variant in your vcf file, e.g., with the following command?

bcftools view -G ukbb.hg38/chr22.hg38.sorted.vcf.bgz | tail -1

It appears to have a position before the first variant, which is invalid.

jjfarrell commented 6 years ago

The last variant is chr22_KI270928v1_alt 60395 rs116271561 C T . . PR

poruloh commented 6 years ago

Sounds like your file includes a contig chr22_KI270928v1_alt. Could you try eliminating all variants on this contig (restricting to variants on the contig 'chr22') and rerunning?

jjfarrell commented 6 years ago

Thanks! The vcf was filtered to only include chr22. The new run did not trigger that error and is now running step 3. Looks like this will let me workaround the issue.

The problematic vcf is a liftover of chr22 from 37 to 38 and then sorted. As a result of the liftover, it includes variants that were remapped to other contigs found in hg38 (eg. chr22_KI270928v1_alt) .

poruloh commented 6 years ago

Glad you solved the problem.

I will plan to add an error-check for unusual contigs in the next release.