Closed luanelandau closed 1 month ago
Hi, We have this information for all HiFi reads, but only for few of ONT reads - those ones used to close gaps in HiFi
You'll need 6-layoutContigs/unitig-popped.layout
and 6-layoutContigs/unitig-popped.layout.scfmap
(for runs with hi-c/pore-c should use 8-hicPipeline/final_contigs/6-layoutContigs/unitig-popped.layout
& 8-hicPipeline/final_contigs/6-layoutContigs/unitig-popped.layout.scfmap
)
.layout.scfmap will give you mapping between scaffold names of form haplotype-smth to several pieces pieceXXXXX (just splitting on gaps), and .layout will give you lists of reads assigned to each piece.
For most of ONT reads we do not have such assignment. There is some indirect data, but actually I'd just realign them to the assemblies.
I'll add that both the *.layout
and the *.scfmap
files will be promoted to the top-level assembly folder so you don't nee to worry about the 6-layout or 8-hicPipeline distinction. Just grab assembly.scfmap
and assembly.homopolymer-compressed.layout
files. The scfmap will look like:
path haplotype1-0000001 haplotype1_from_utig4-0
piece000001
end
path haplotype2-0000002 haplotype2_from_utig4-3
piece000002
end
while the layout will look like:
tig piece000001
len 2866921
rds 6298
m84005_220827_014912_s1/71567556/ccs 18121 0
m84039_230418_213342_s3/19269101/ccs 17844 5246
m84039_230418_213342_s3/225903430/ccs 20947 5506
m84005_220827_014912_s1/126684173/ccs 17954 7546
m84005_220827_014912_s1/263785287/ccs 22734 9688
m84039_230418_213342_s3/33034490/ccs 13474 26619
m84039_230418_213342_s3/142280257/ccs 13525 27248
...
end
So if you want to find reads that belong to phased contig haplotype1-0000001
, look at reads in piece000001
in the layout. Note that some paths may have multiple pieces in case of gaps in the assembly.
If you want a friendlier way to get this information, including for all ONT reads, you can add the --consensus-bam
option to verkko (assuming you're on v2.2). This will make an assembly.bam
with each of the pieces having all their HiFi and ONT reads aligned to the final consensus sequence. This would require re-running part of the assembly though.
I am looking to locally reassemble a locus of interest using the phased reads from verkko. I would like to find the file that indexes which reads make up each of the phased contigs (paternal and maternal). How can I find that information, is it in any default output? Thank you!