lh3 / miniprot

Align proteins to genomes with splicing and frameshift
https://lh3.github.io/miniprot/
MIT License
334 stars 18 forks source link

Information function --trans #45

Closed rlibouba closed 1 year ago

rlibouba commented 1 year ago

Hello,

I'd like to know more about the new --trans function. By selecting the --gtf format as output and the --trans function, I get an output file containing lines in both paf and gtf formats. Is this intended? Is there any way of separating the output?

Here are the first 6 lines of my output files (with the --trans and --gtf parameters):

> tr|I6YGH7|I6YGH7_MYCTU    377 0   375 -   NC_000962.3 699930  326271  327381  396 1158    0   AS:i:516    ms:i:512    np:i:197    da:i:9  do:i:3  cg:Z:17M1D20M14I2M1I24M1I195M4D61M6D40M cs:Z:*gtaM:1*atcL*gacN*ctgF:1*cccD*tctE:1*gagL*aagA*ctgF*cggQ:1*cagE*atcV:1-gcc:1*gtcF*gccL:1*ctgA*aagN:1*atgA*ccgS*cgaI*gagP*ccgT*cgcK*actS*gtcY*gcgD*atcN:3+FAQHRYWDRVLFDA:1*tggL+S:1*ctgI*ccaT*tatW*ctgP*cccA:1*ccgY*tggG:2*gccD:1*agcP*ccgL*gtcL*gagH*cagW:1*atcV*atcF*gccE*cagE:1+Y:1*accR:2*cggA*gtcP*aagG:1*ccgA*cagS*attA*gccN*atcG*gcgT*acgS*tggM*atcL*gtgA:1*tcgT*atcL*gtcF:1*ttcH:2*gacA*aatE:1*aagL*cagD:1*ctcI:2*ccaK*acgM*ttcA*cgaS:1*gacE*attQ*ttcI:1*tgcA:1*ctgA*ttcW:3*ggcE*gcgS:7*gcaR*accS*aagT:1*accS*cggK:5*cgcL*atcL*actN:5*accS*accS*ggcR:1*cagP*tacF*tccA*caaD*tggM:1*gcgF*ctgG:1*gcgF:1*acgS:2*tcgA*gcgV*cccE*aagR:1*aacR:1*atcL:3*ctgM*ctgF:1*atgL:1*agcA*gaaK:2*caaT:1*aagR:1*ctgI*cgcA*gagQ:1*acgG:1*aagD*gagT*tttG:1*aacG*accE*gtcI*tacF:8*gagR*ttgD:1*ctcI:1*gagA*gtgP:1*cggD:2*gagR*gtcA*agcA*cgcM*aacS:1*ctgS*acgS*gccN:2*gtgG*tcgM*atcS*ggcL*ggcR:1*gatP*tcgA*accR:2*cccA*accS*ctgA*ggcE*gagR*ttcL:1*gacQ*ttcL*gttW*cgcK:1*tacR*cgtG*ttcS*gaaP*ggaP*cagE:1-gaccaggtcgcg*cgaA*cacD:1*gccV*gggA*caaD*ttgA*atcW*gccI*gagK*ggcA*cacQ:1*accY*aagR:1*ctcQ*aacT*ttgF*cgcG*tccT*acgV*ctgT*ttgR*acgL*ctgA:3*gacE*ccgL*atgG:1*ccgE*gcgS*gcgS*atcV*tccT:1*ctgV*ttgF*tccW*atgS*cgcE*accL*ggcD*cagV*ggtH*tatL*gccH*gaaQ*ttcT:1*gtgL*tcgD*tccL*tttR:1*accA:1*gcgG-gtgatcggcgacacagag*cgaE:1*cccA:1*aagP:1*ggcT:1*tacG:2*gccF*agcA*cggL*gccG*accG*accP:2*gggA:2*tcgN:1*gtcI:1*ctcR:12*gacE
##STA   VDIDLDPSTEKLRAQIRAEVAALKAMPREPRTVAIAEGGWVLPYLPKPWGRAASPVEQIIIAQEFTAGRVKRPQIAIATWIVPSIVAFGTDNQKQRLLPPTFRGDIFWCQLFSEPGAGSDLASLATKATRVDGGWRITGQKIWTTGAQYSQWGALLARTDPSAPKHNGITYFLLDMKSEGVQVKPLRELTGKEFFNTVYLDDVFVPDELVLGEVNRGWEVSRNTLTAERVSIGGSDSTFLPTLGEFVDFVRDYRFEGQFDQVARHRAGQLIAEGHATKLLNLRSTLLTLAGGDPMAPAAISKLLSMRTGQGYAEFAVSSFGTDAVIGDTERLPGKWGEYLLASRATTIYGGTSEVQLNIIAERLLGLPRDP
NC_000962.3 miniprot    gene    326272  327381  512 -   .   gene_id "MPG000001";
NC_000962.3 miniprot    transcript  326272  327381  512 -   .   transcript_id "MPT000001"; gene_id "MPG000001";
NC_000962.3 miniprot    exon    326272  327381  512 -   .   transcript_id "MPT000001"; gene_id "MPG000001";
NC_000962.3 miniprot    CDS 326272  327381  512 -   0   transcript_id "MPT000001"; gene_id "MPG000001";
> 

Thank you in advance. Have a nice day, Romane

lh3 commented 1 year ago

This is a bug. Now fixed on github HEAD. The "--gff" output is not affected.