Open SZ-qing opened 1 year ago
I don't think there's currently any simple way to do what you're asking in VG, but it might be possible with enough fiddling. For a k-mer based algorithm to be guaranteed to find mappings with 2 mismatches in a sequence of length 27, you would need to set k = 9 (i.e. 27 / (2 + 1)). In the human genome, nearly all 9-mer matches will be spurious. Matches only start to become unique around ~15 bp. My guess is that in setting k = 9, you will also have to adjust many other parameters (especially the --hit-cap
and --hard-hit-cap
) to get vg giraffe
to work well. Even then, I expect the mapping to be quite slow, because it will be bogged down in spurious alignments. That said, if you're interested in experimenting with it, it might be possible to troubleshoot a functional pipeline.
I have a sequence of
27bp length
, and I want to find out theexact match of this sequence, and all the sequences that have 1 mismatch and 2 mismatch
, by the method of graph.I used
bowtie1
to find this sequence in thelinear HG38 genome
with1 sequence with one mismatch
and12 sequences with two mismatches
. But I compared it in graph usinggiraffe
and it came up with one paths, so I'm wondering if the graph reference genome can accomplish what I did in the linear genome?linear HG38 results:
human pangenome:
command line is:
vg giraffe -Z ../hprc-v1.1-mc-grch38.gbz -m ../hprc-v1.1-mc-grch38_k16_w6.min -p -f ./input.fq --max-alignments 10000 -o gam --max-multimaps 10000 > ./output_k16_w6_aln_M10000.gam
graph file was downloaded from: https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-grch38
Thanks!!