Closed tianjio closed 4 months ago
The overlap length of the two reads in the contig position was 247,283 bp, but the overlap length between the two reads obtained by the ovStoreDump command was only 127,200
The readToTig file will include all reads, including contained reads since it's used to build consensus where contained reads are informative for the sequence. In general the read overlaps in the file will match the store, though if you're running with the -pacbio-hifi
option, the ovStore will be in HPC space while the readToTigFile will not.
Sometimes, an overlap may be a different since in the readToTig file than the overlap store since the readToTigFile is built pairwise followed by read placement so if that read was originally not used in the path, it may get added based on another overlap than the one you've dumped. This can happen with reads from the alt haplotype. If a read is too diverged, consensus is allowed to not use it and skip over a read. If you want to know the exact path the assembler is using to reconstruct the contig walk, you want to use ovStoreDump with the -picture option and -bogart unitigging/4-unitigger/*.best
. For example:
% head asm.contigs.layout.readToTig
#readID tigID bgn end
2402 1 0 13221
1862 1 14388 539
4794 1 15075 1017
7705 1 14300 1248
5508 1 19451 5833
4343 1 6326 19953
914 1 7826 22286
4848 1 8471 22375
4584 1 9511 22617
% ovStoreDump -S asm.seqStore -O unitigging/asm.ovlStore -picture 2402 -bogart `ls unitigging/4-unitigger/*.best.edges |sed s/.edges//g`
Opened seqStore 'asm.seqStore' for 'corrected-trimmed' reads.
A 0:13217 A 2402 0:13217 13217 |--------------------------------------------------------------------------------------------------->|
A 0:1426 B 5660 14208:15635 15635 0.000% +14208 |----------> | dovetail
A 0:3445 B 8031 13683:17130 17130 0.000% +13683 |--------------------------> | dovetail
A 0:3617 B 585 10856:14475 14475 0.000% +10856 |---------------------------> | dovetail
A 0:3998 B 4958 10507:14507 14507 0.025% +10507 |------------------------------> | dovetail
A 0:4327 B 1482 9207:13536 13536 0.023% +9207 |cccccccccccccccccccccccccccccccc> | contained
A 0:4378 B 4399 0:4379 14537 0.023% +10158 |<--------------------------------- | dovetail
A 0:4660 B 3481 0:4666 13737 0.021% +9071 |<ccccccccccccccccccccccccccccccccccc | contained
A 0:5939 B 6791 0:5939 13725 0.067% +7786 |<cccccccccccccccccccccccccccccccccccccccccccc | contained
A 0:6208 B 6648 6941:13159 13159 0.081% +6941 |cccccccccccccccccccccccccccccccccccccccccccccc> | contained
A 0:6810 B 7889 0:6812 16283 0.015% +9471 |<--------------------------------------------------- | dovetail
A 0:6837 B 4238 0:6833 14927 0.015% +8094 |<--------------------------------------------------- | dovetail
A 0:8174 B 7985 0:8177 13712 0.012% +5535 |<------------------------------------------------------------- | dovetail
A 0:10132 B 5250 4115:14251 14251 0.020% +4115 |============================================================================> | dovetail
A 539:13217 B 1862 1165:13849 13849 0.032% | <===============================================================================================| +1165 dovetail
A 1017:13217 B 4794 1854:14054 14054 0.041% | <--------------------------------------------------------------------------------------------| +1854 dovetail
A 1247:13217 B 7705 1076:13063 13063 0.033% | <cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc| +1076 contained
A 2276:13217 B 734 0:10943 14822 0.037% | gggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg>| +3879 coverage-gap
A 5832:13217 B 5508 6228:13623 13623 0.068% | <-------------------------------------------------------| +6228 dovetail
A 6325:13217 B 4343 0:6888 13624 0.058% | ---------------------------------------------------->| +6736 dovetail
A 7824:13217 B 914 0:5394 14453 0.056% | ---------------------------------------->| +9059 dovetail
A 8469:13217 B 4848 0:4745 13897 0.042% | ----------------------------------->| +9152 dovetail
A 9508:13217 B 4584 0:3712 13110 0.027% | ---------------------------->| +9398 dovetail
A 12366:13217 B 8759 0:851 13928 0.118% | ------>| +13077 dovetail
% ovStoreDump -S asm.seqStore -O unitigging/asm.ovlStore -picture 1862 -bogart `ls unitigging/4-unitigger/*.best.edges |sed s/.edges//g`
Opened seqStore 'asm.seqStore' for 'corrected-trimmed' reads.
A 0:13849 A 1862 0:13849 13849 |--------------------------------------------------------------------------------------------------->|
A 0:513 B 7143 14232:14747 14747 0.388% +14232 |---> | dovetail
A 0:744 B 7041 13867:14610 14610 0.268% +13867 |ccccc> | contained
A 0:752 B 3932 13976:14730 14730 0.663% +13976 |-----> | dovetail
A 0:999 B 5442 12129:13129 13129 0.599% +12129 |-------> | dovetail
A 0:1673 B 6202 14292:15952 15952 0.179% +14292 |gggggggggggg> | coverage-gap
A 0:2016 B 8759 0:2018 13928 0.198% +11910 |<-------------- | dovetail
A 0:4876 B 4584 0:4879 13110 0.000% +8231 |<----------------------------------- | dovetail
A 0:5916 B 4848 0:5912 13897 0.169% +7985 |<------------------------------------------ | dovetail
A 0:6562 B 914 0:6560 14453 0.015% +7893 |<----------------------------------------------- | dovetail
A 0:8062 B 4343 0:8050 13624 0.012% +5574 |<---------------------------------------------------------- | dovetail
A 0:8555 B 5508 5062:13623 13623 0.070% +5062 |-------------------------------------------------------------> | dovetail
A 0:12111 B 734 0:12112 14822 0.000% +2710 |<ggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg | coverage-gap
A 0:13371 B 4794 687:14054 14054 0.075% +687 |================================================================================================> | dovetail
A 87:13140 B 7705 0:13063 13063 0.008% |cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc> | contained
A 1165:13849 B 2402 539:13217 13217 0.032% | <===========================================================================================| +539 dovetail
A 4250:13849 B 5250 4655:14251 14251 0.052% | <---------------------------------------------------------------------| +4655 dovetail
A 6212:13849 B 7985 0:7637 13712 0.026% | ------------------------------------------------------->| +6075 dovetail
A 7550:13849 B 4238 0:6294 14927 0.000% | --------------------------------------------->| +8633 dovetail
A 7577:13849 B 7889 0:6271 16283 0.000% | --------------------------------------------->| +10012 dovetail
A 8179:13849 B 6648 7481:13159 13159 0.106% | <cccccccccccccccccccccccccccccccccccccccc| +7481 contained
A 8448:13849 B 6791 0:5399 13725 0.093% | cccccccccccccccccccccccccccccccccccccc>| +8326 contained
A 9727:13849 B 3481 0:4127 13737 0.000% | ccccccccccccccccccccccccccccc>| +9610 contained
A 10009:13849 B 4399 0:3839 14537 0.000% | --------------------------->| +10698 dovetail
A 10060:13849 B 1482 9747:13536 13536 0.000% | <ccccccccccccccccccccccccccc| +9747 contained
A 10389:13849 B 4958 11047:14507 14507 0.029% | <------------------------| +11047 dovetail
A 10770:13849 B 585 11396:14475 14475 0.000% | <----------------------| +11396 dovetail
A 10942:13849 B 8031 14223:17130 17130 0.000% | <--------------------| +14223 dovetail
A 12961:13849 B 5660 14746:15635 15635 0.000% | <------| +14746 dovetail
The === lines indicate the best edge in the graph so here the path starts with read 2402, followed by 1862, and 4794.
Idle
For the *contigs.layout.readToTig file, the first column is readsID, the second column is tigID, the third column is the start position of the corresponding reads in the tig, and the fourth column is the end position of the reads in the tig. I sorted the files by reads starting location and found that many reads overlapped with others at the tig location. Are these reads redundant? Can I ignore these reads if I want to obtain the path of reads constituting tig through overlapping relationships?