jonathanBieler / BurrowsWheelerAligner.jl

Binding for bwa
MIT License
8 stars 0 forks source link

Suggestions #5

Open mashu opened 7 months ago

mashu commented 7 months ago

Hi,

Thanks again for this package, this is extremely useful.

The code to get full description

description(aln::BWA.LibBWA.mem_aln_t, aligner::BWA.Aligner) = begin
        anns = unsafe_load(aligner.index.bns).anns
        anno = unsafe_load(anns, aln.rid+1).anno
        unsafe_string(anno)
end

Works just fine.

I have a small feature request, would it be possible also to access aligned query and target ? Is that exposed somehow ? The problem I am trying to solve is count the number of differences from the reference, but score includes in matches also mismatches, I think it's only counting insertions and deletions as mismatches?

Thanks!

jonathanBieler commented 7 months ago

Thanks, I've push the description method (not released yet though).

I'm not sure about the edit distance, I think you would need the MD tag for that but I can't see it in what I have here. BWA is not really documented so it's not easy to add stuff, it seems there's a method to compute a MD but not sure if I can call it :

https://github.com/lh3/bwa/blob/139f68fc4c3747813783a488aef2adc86626b01b/bwase.c#L315

Maybe the easiest would be to load the reference genome in memory, and then run a local alignement in Julia on the read + part of the genome where the read maps.

Otherwise it seems bwa_cal_md1 is just looping through the alignment as defined by the CIGAR and comparing to the reference genome, so you could also do that in Julia relatively easily.

mashu commented 7 months ago

I suspect that you can't get anything like Levenshtein distance form BWA. I was also thinking to align myself with BioAlignments but wasn't sure how to get easily reference sequence to which stuff is aligned.