waveygang / wfmash

base-accurate DNA sequence alignments using WFA and mashmap3
MIT License
174 stars 18 forks source link

v0.12.5 emits more tabs than expected #221

Open subwaystation opened 8 months ago

subwaystation commented 8 months ago

Hi ho :)

Running

./build/bin/wfmash data/reference.fa.gz data/reference.fa.gz > aln.paf

with the current master gives me either some duplicated values or additional ones I don't understand:

head aln.paf
sample  1399930 0       1399930 +       sample  1399930 0       1399930 1399930 1399930 255     gi:f:100        bi:f:100        md:f:100        wt:i:244        pt:i:14 aa:i:21870      ap:i:10935      cg:Z:1399930=

Could you please explain all the columns and update https://github.com/waveygang/wfmash/blob/main/scripts/split_approx_mappings_in_chunks.py#L40 accordingly? Thanks!

AndreaGuarracino commented 8 months ago

Those are not duplicated values. As query/target are identical, all kinds of identities are 100%.

Making the split_approx_mappings_in_chunks.py more robust with https://github.com/waveygang/wfmash/pull/222

subwaystation commented 8 months ago

Makes sense!

Could still please explain all columns? I suspect https://github.com/lh3/miniasm/blob/master/PAF.md is not sufficient here.