Closed andreaswallberg closed 4 years ago
Thanks so much. First I will add a option in wtpoa-cns
to output the mapped coordinates between layout file and consensus sequences. Then, I will ask for your help to run the consensus again, and find the coordinating part in layout file. Last, I will debug on the located small region of layout file.
I will replay this issue after I finish the new option.
Jue
https://github.com/ruanjue/wtdbg2/commit/6329a5f0e2635c0b3a2c6db6a9115700089ca5a7.
wtpoa-cns <other-options> -e map.txt
#ctg ctg_off edge edge_full_len edge_off edge_len
ctg1 0 E0 3002 0 2916
ctg1 2916 E1 2785 917 2785
ctg1 4784 E2 2054 1026 1965
ctg1 5723 E3 2040 931 1658
ctg1 6450 E4 2580 627 2168
ctg1 7991 E5 1847 508 1847
ctg1 9330 E6 1575 4 1575
ctg1 10901 E7 2278 1 2278
ctg1 13178 E8 2747 1038 2128
ctg1 14268 E9 2010 461 2010
You can locate the peculiar regions by ctg+ctg_off
, then find the problemic edge
. E0 is the first layout block started with E
, so on the E1, E2, ...
Best, Jue
Dear @ruanjue ,
I am working on the assembly of a large and complex genome and have noticed some odd motifs in the consensus sequence, that may be associated with micro-satellites. Basically, I seem to get microsatellite-associated homopolymer sequences in the consensus sequence that do not appear to be supported by any mapped ONT long-read (I have mapped with both minimap2 and ma "modular aligner").
Admittedly, I have not done a systematic scan for these but just seen two cases when eyeballing a single contig with samtools tview. By their very nature, these regions consist of low complexity DNA but I am still puzzled by the result and wonder if needs to be brought to attention by you and the developers. Cheers!