maickrau / GraphAligner

MIT License
256 stars 30 forks source link

Aligning HiFi reads to the assembly graph generated by flye #99

Closed Wenfei-Xian closed 6 months ago

Wenfei-Xian commented 6 months ago

Hi Mikko,

I try to align the HiFi reads to the assembly graph generated by flye (chloroplast genome). I found the maximum length of the alignment region (10th and 11th columns) only 7.1kb and most of the length of alignment region are shorter than 1kb.

the command I used listed below

GraphAligner -g 11C1.q20.fastq.gz.chloroplast.reads.fasta.filter.2000.round1.both.chloroplast.gfa -f 11C1.q20.fastq.gz.mitochondrial.reads.minimap2.fasta.filter.fasta -x vg -a zz.gaf -t 128 --precise-clipping 0.95
sort -nk 11 zz.gaf | tail
m64079_191127_192615/88801300/ccs   13769   1864    2800    +   <edge_3 26258   500 1426    910 938 60  NM:i:28 AS:f:376.28 dv:f:0.0298507  id:f:0.970149   cg:Z:58=1X32=1X16=1X6=1X9=1X70=1I18=1I1=2I9=1D34=1I162=1X1I47=1X18=1X49=1X148=1X15=4I22=1X4=1X72=1I2=1I24=1X13=1D60=1X21=
m64079_191127_192615/9438700/ccs    14544   2314    3252    +   <edge_3 26258   500 1426    912 938 60  NM:i:26 AS:f:418.26 dv:f:0.0277186  id:f:0.972281   cg:Z:58=1X32=1X16=1X6=1X9=1X88=1I1=2I114=1I92=1X47=1X18=1X49=1X115=1I33=1X15=4I10=1I12=1X4=1X72=1I2=1I24=1X74=1X21=
m64079_191127_192615/98109151/ccs   15950   1169    2106    +   <edge_3 26258   500 1426    909 938 60  NM:i:29 AS:f:357.29 dv:f:0.0309168  id:f:0.969083   cg:Z:58=1X32=1X16=1X6=1X9=1X88=1I1=2I11=1X187=1D6=1X47=1X8=1X9=1X49=1X45=1I8=1I65=1I30=1X15=4I22=1X4=1X72=1I2=1I24=1X74=1X21=
m64079_191127_192615/15860139/ccs   15154   8666    9605    +   <edge_3 26258   500 1426    912 939 60  NM:i:27 AS:f:399.27 dv:f:0.028754   id:f:0.971246   cg:Z:58=1X32=1X16=1X6=1X9=1X12=1I76=1I1=2I114=1I92=1X45=1X2=1I18=1X35=1I14=1X148=1X15=4I22=1X4=1X72=1I2=1I24=1X74=1X21=
m64079_191127_192615/159385635/ccs  12386   5104    6041    +   <edge_3 26258   500 1426    909 939 60  NM:i:30 AS:f:337.3  dv:f:0.0319489  id:f:0.968051   cg:Z:58=1X32=1X16=1X4=1D1=1X9=1X64=1I24=1I1=2I5=1I114=1D86=1X47=1X18=1X43=1I6=1I149=1X15=4I22=1X4=1X72=1I2=1I24=1X74=1X19=2X
m64079_191127_192615/92406696/ccs   18439   15413   16351   +   <edge_3 26258   500 1426    911 939 60  NM:i:28 AS:f:378.28 dv:f:0.029819   id:f:0.970181   cg:Z:38=1I20=1X32=1X16=1X6=1X9=1X88=1I1=2I206=1X47=1X18=1X49=1X18=2I26=1D103=1X15=4I22=1X4=1X72=1I2=1I24=1X22=1I52=1X21=
m64079_191127_192615/82576209/ccs   12607   11660   12600   +   >edge_3 26258   24832   25763   915 940 60  NM:i:25 AS:f:440.25 dv:f:0.0265957  id:f:0.973404   cg:Z:21=1X74=1X28=2I70=1X4=1X27=4I10=1X148=1X49=1X18=1X47=1X214=3I81=1X9=1X6=1X16=1X32=1X58=1X3=1X
m64079_191127_192615/161809391/ccs  14304   10863   11800   +   <edge_3 26258   500 1426    908 941 60  NM:i:33 AS:f:277.33 dv:f:0.0350691  id:f:0.964931   cg:Z:58=1X32=1X16=1X3=1D2=1X9=1X37=1D50=1I1=2I18=1I6=1I37=1I72=1D28=1I44=1X47=1X18=1X33=1I16=1X97=1D5=1I45=1X15=4I22=1X4=1X72=1I2=1I24=1X74=1X21=
m64079_191127_192615/7995843/ccs    14433   2945    3884    +   <edge_3 26258   500 1426    909 942 60  NM:i:33 AS:f:279.33 dv:f:0.0350318  id:f:0.964968   cg:Z:58=1X32=1X8=1I8=1X6=1X9=1X88=1I1=2I6=1I159=1D40=1X14=1I33=1X18=1X23=1I26=1X29=1I42=1D13=1I2=1I61=1X15=4I22=1X4=1X72=1I2=1I10=1D13=1X74=1X21=
m64079_191127_192615/44631304/ccs   16058   0   7114    +   <edge_2 83968   12248   19365   7112    7117    60  NM:i:5  AS:f:7014.05    dv:f:0.000702543    id:f:0.999297   cg:Z:688=1D1840=1D1067=1D3517=2X

Many thanks !

Wenfei-Xian commented 6 months ago

Oh, I aligned the mitochondrial reads to the chloroplast graph .... When I mapped the mitochondrial reads to mitochondrial graph, it worked very well 👍

GraphAligner -g assembly_graph.gfa -f ../11C1.q20.fastq.gz.mitochondrial.reads.minimap2.fasta.filter.fasta -t 128 -a z.gaf -x vg
GraphAligner bioconda 1.0.17-
GraphAligner bioconda 1.0.17-
Load graph from assembly_graph.gfa
Build alignment graph
Build minimizer seeder from the graph
Minimizer seeds, length 15, window size 20, density 10
Seed cluster size 1
Extend up to 5 seed clusters
Alignment bandwidth 10
Clip alignment ends with identity < 66%
X-drop DP score cutoff 14705
Backtrace from 10 highest scoring local maxima per cluster
write alignments to z.gaf
Align
Alignment finished
Input reads: 17880 (246708087bp)
Seeds found: 71995471
Seeds extended: 89392
Reads with a seed: 17880 (246708087bp)
Reads with an alignment: 17880 (246097004bp)
Alignments: 18341 (246137146bp) (1476 additional alignments discarded)
End-to-end alignments: 17262 (237879489bp)