lh3 / minimap2

A versatile pairwise aligner for genomic and spliced nucleotide sequences
https://lh3.github.io/minimap2
Other
1.81k stars 415 forks source link

PAF records truncated? #345

Closed nhansen closed 5 years ago

nhansen commented 5 years ago

I'm running minimap2 version 2.15-r905 on polished and unpolished contigs produced by wtdbg2. For sequences that are mostly or entirely lower case, minimap2 is writing paf alignment records with only two fields (the readname and the read length). Is this expected behavior? When I attempt to analyze the file with paftools.js, I get an error because the quality field is not an integer.

minimap2 --paf-no-hit -cx asm20 -r2k -z1000,500 -t 2 hg19.fa myseq.fasta > output.paf

Thanks! --Nancy
lh3 commented 5 years ago

PAF normally doesn't contain unmapped reads. The undocumented option --paf-no-hit output these reads in a shorter format. What paftools.js command line are you using?

nhansen commented 5 years ago

Ah--I was attempting to do assembly statistics as in the wtdbg2 preprint, but didn't realize I could only use the output with "paftools.js asmstat -q50000 -d.1". Thanks!

--Nancy

lh3 commented 5 years ago

What is the paftools.js command line that triggers the error? I hope all paftools.js functionality to work with such truncated lines.

nhansen commented 5 years ago

I was just running "paftools.js stat" on the resulting paf file, and got the following error:

[nhansen@biowulf consensus_seqs]$ paftools.js stat HG00732.wtdbg_contigs.minimap.asm20r2kz1k500.hg19.paf /usr/local/apps/minimap2/2.15/misc/paftools.js:999: TypeError: Cannot read property 'length' of undefined aqlen = t[9].length; ^ TypeError: Cannot read property 'length' of undefined at paf_stat (/usr/local/apps/minimap2/2.15/misc/paftools.js:999:17) at main (/usr/local/apps/minimap2/2.15/misc/paftools.js:2496:26) at /usr/local/apps/minimap2/2.15/misc/paftools.js:2512:1

lh3 commented 5 years ago

This will be addressed in the next release I am working on. Thanks a lot for the report.

lh3 commented 5 years ago

The new release doesn't output such truncated lines any more. It instead outputs a line like

LKHW01002830.1  34387  0     0     *     *     0     0     0     0     0     0     rl:i:33513

The stat and asmstat have been modified to recognize both truncated lines and such a full line, though other paftools.js commands haven't been updated. If you have new problems, please create a new issue or reopen this. Thanks.