lh3 / miniprot

Align proteins to genomes with splicing and frameshift
https://lh3.github.io/miniprot/
MIT License
312 stars 17 forks source link

Issue with paftools.js after miniprot #2

Closed charlesfeigin closed 1 year ago

charlesfeigin commented 1 year ago

Hi! Really excited to try this program out. I'm trying to map proteins from the RefSeq annotation of one species to a high-quality assembly of a close relative. I followed the example on the main github page, just using my genome and protein fasta files.

miniprot -ut 16 ref.fasta prots.faa >aln.paf

paftools.js paf2gff -a aln.paf > aln.gff

However, when I try to use the most recent paftools.js I get the following error:

Error: failed to find the cg:Z tag if (cigar == null) throw Error("failed to find the cg:Z tag"); ^ Error: failed to find the cg:Z tag at Error () at paf_paf2gff (/path/to/paftools.js:3212:28) at main (/path/to/paftools.js:3337:29) at /path/to/paftools.js:3342:1

When I open the paf alignment I can see cg:Z tags though. Any idea why this may be happening? Thanks!

lh3 commented 1 year ago

Thanks. That is a bug in paftools.js. I have fixed it. Please check out this script from minimap2 and try it again.

charlesfeigin commented 1 year ago

Thanks so much for your fast reply. Just to clarify, the paftools.js here is updated https://github.com/lh3/minimap2/tree/master/misc/ ? I redownloaded minimap2 from its github page but am still getting the same error. Apologies.

lh3 commented 1 year ago

Sorry, my bad. I forgot to push the change. It should work now.

lh3 commented 1 year ago

@charlesfeigin Thanks for early testing. Miniprot can now directly output GFF3 with option --gff.

charlesfeigin commented 1 year ago

@lh3 Happy to and thanks for adding that feature. Looking forward to testing it for some comparative genomics in a group of closely related mammals.

jdmontenegro commented 1 year ago

Hi, thanks for this tool it is very useful. I was trying to convert the paf output to gff using paftools (for some reason the --gff flag was producing empty files directly in miniprot) but I keep getting the following error:

$ paftools.js paf2gff  GCA_009827155.1_ASM982715v1_genomic_ry_peps.paf > paftools.js paf2gff - > GCA_009827155.1_ASM982715v1_genomic_ry_peps.gff
paftools.js:3516: Error: inconsistent cigar
        if (en != t[8] - t[7]) throw Error("inconsistent cigar");
                               ^
Error: inconsistent cigar
    at Error (<anonymous>)
    at paf_paf2gff (/scratch/molevo/jmontenegro/software/bin/paftools.js:3516:32)
    at main (/scratch/molevo/jmontenegro/software/bin/paftools.js:3664:29)
    at /scratch/molevo/jmontenegro/software/bin/paftools.js:3669:1

the paf files look good and I can convert them manually, but still it seems this is a bug related to the previous one. I am using the latest paftools:

$ head -n 3 paftools.js
#!/usr/bin/env k8

var paftools_version = '2.24-r1152-dirty';

Is his the latest version or am I missing something? Thank you!

Juan D.

jdmontenegro commented 1 year ago

Nevermind, the problem with the conversion was due to the comment lines with the alignments in them. So I fixed it by doing this:

sed -e '/^#/d' <myPAF> | paftools.js paf2gff -a - > <myGFF>

Where can I find documentation about what the "-a" flag mean? It seems to be optional, but for miniprot paf results it is essential.

Regards,

Juan D

lh3 commented 1 year ago

I would recommend to use the --gff option of miniprot. It is faster and gives more information. paftools.js paf2gff may be deprecated in future.

for some reason the --gff flag was producing empty files directly in miniprot

Could you compile from source and try again?