lh3 / minimap2

A versatile pairwise aligner for genomic and spliced nucleotide sequences
https://lh3.github.io/minimap2
Other
1.77k stars 405 forks source link

Output of .paf and qname issue #1088

Closed morgan-sparks closed 1 year ago

morgan-sparks commented 1 year ago

I am using ONT long reads to try and find breakpoints for a large putative chromosomal inversion in our study organism (salmon). My end goal is to generate a dotplot from my .paf file to give a rough sense of where breakpoints are.

I used the command

minimap2 -x ava-ont $GENOME $READS/barcode09.fastq.gz > barcode09.OgorEvenAss.approx-mapping.paf

to generate a .paf file between between the species assembly (a fasta file) and our raw reads ( a fastq file).

However the qname in my .paf file is the parent_read_id from the fastq file.

For example (first five lines of my .paf):

74baa321-d50a-48f8-a8b8-dbe2a90b3708    484 103 436 +   NC_060193.1 60884844    44630007    44630343    262 336 0   tp:A:S  cm:i:55 s1:i:261    dv:f:0.0239 rl:i:202
40627972-bcb1-47d4-aad4-4f1a8c2f2535    3205    93  3118    -   NC_060196.1 66433064    36518740    36521756    2167    3050    0   tp:A:S  cm:i:521    s1:i:2154   dv:f:0.0294 rl:i:681
40627972-bcb1-47d4-aad4-4f1a8c2f2535    3205    1370    1812    +   NC_060194.1 52623370    39483873    39484307    122 445 0   tp:A:S  cm:i:14 s1:i:118    dv:f:0.1173 rl:i:681
40627972-bcb1-47d4-aad4-4f1a8c2f2535    3205    1370    1812    -   NC_060195.1 72202100    65678923    65679365    122 452 0   tp:A:S  cm:i:15 s1:i:117    dv:f:0.1132 rl:i:681

I also attempted the same approach with from my SAM files using the command (and received essentially the same results):

paftools.js sam2paf barcode06.aln.sam > barcode06.aln.paf

Is the only way to get a .paf file with chromsome or contig names in the qname and tname columns to use an assembled .fasta file?

lh3 commented 1 year ago

Is the only way to get a .paf file with chromsome or contig names in the qname and tname columns to use an assembled .fasta file?

Yes