maickrau / GraphAligner

MIT License
261 stars 32 forks source link

Alignments jumping haplotype? #89

Open skoren opened 1 year ago

skoren commented 1 year ago

I have a human sample from HPRC (HG01099) where it seems there is incorrect alignment between haplotypes. Here is the region post-resolution: Screen Shot 2023-08-08 at 1 57 18 PM You can see utig4-1010 is connected to a paternal and maternal haplotype. There is another region (with node utig4-103[01]) where two maternal nodes merge into a single maternal node. Going back to the HiFi-only graph this is the structure in the area:

Screen Shot 2023-08-09 at 7 01 10 PM I tried to label the HiFi nodes w/their utig4 assignments. You can see that there is a resolution here which would connected paternal to paternal and both maternal to their correct counterpart. However, one of the maternal paths is being broken and instead is connected to the paternal path through utig1-46519. GraphAligner has strong support for the path from utig1-44125 to utig1-36366 (24 reads) vs either utig1-42180 (2) or utig1-42179 (0). However, re-mapping the same reads w/winnowmap gives no support for this path. All the reads aligned from utig1-44125 to utig1-36366 instead align to utig1-46520 and utig1-36366. The hapmers are a bit noisy since these are ONT reads but when I take strongly paternal reads, they are mapped by graphaligner to this path (6 reads) and not by winnowmap, implying they are likely from the paternal haplotype. The results are the same with GraphAligner w/the diploid heuristic as well, the two GraphAligner commands I ran were:

GraphAligner -t 24 -g 2-processGraph/unitig-unrolled-hifi-resolved.gfa -f reads.fasta.gz -a aligns.gaf \       
  --seeds-mxm-windowsize 5000 --seeds-mxm-length 30 \
  --seeds-mem-count 10000 \
  --bandwidth 15 \
  --multimap-score-fraction 0.99 \
  --precise-clipping 0.85 \
  --min-alignment-score 5000 \
  --hpc-collapse-reads \
  --discard-cigar \
  --clip-ambiguous-ends 100 \
  --overlap-incompatible-cutoff 0.15 \
  --max-trace-count 5 \
  --mem-index-no-wavelet-tree

and

GraphAligner -t 24 -g 2-processGraph/unitig-unrolled-hifi-resolved.gfa -f reads.fasta.gz -a aligns.gaf \       
  --seeds-mxm-windowsize 5000 --seeds-mxm-length 30 \
  --seeds-mem-count 10000 \
  --bandwidth 15 \
  --multimap-score-fraction 0.99 \
  --precise-clipping 0.85 \
  --min-alignment-score 5000 \
  --hpc-collapse-reads \
  --discard-cigar \
  --clip-ambiguous-ends 100 \
  --overlap-incompatible-cutoff 0.15 \
  --max-trace-count 5 \
  --mem-index-no-wavelet-tree
  -- diploid-heuristic

The asm, relevant reads, and alignments are on the globus share under graphaligner.