lh3 / srf

SRF: Satellite Repeat Finder
MIT License
87 stars 6 forks source link

How to understand the meaning of output? #6

Open Orz-CQ opened 1 year ago

Orz-CQ commented 1 year ago

Hi Prof Li,

I am curious about how to read the results file. Since there are no header and no description in the manuscript.

For example,

srf-aln.bed

Hap1Chr10   0   598 prefix#circ1-7  3.212577819676956   39397427    7   1
Hap1Chr10   645 860 prefix#circ1-7  7.834101382488479   39397427    7   1
Hap1Chr10   1335    1762    prefix#circ1-7  7.04896161086285    39397427    7   1
Hap1Chr10   1808    2129    prefix#circ1-7  8.475765485111276   39397427    7   1

srf-aln.len

prefix#circ2-510    1557996 0.9188644989499729  0.37808279015137913 0.37846257261974675
prefix#circ8-3288   388183  0.3024941409303183  0.09420134052291071 0.10453409306005174
prefix#circ4-1170   384661  0.9344680648134569  0.09334664796470571 0.09614660331296503
prefix#circ1-7  263795  4.78851708218328    0.064015793126544   0.06402380131916773

Thanks in advance, Lan

lh3 commented 1 year ago

Need to update the document. Just briefly for now, the BED file gives: chr, start, end, SRF contig name, and mean percent identity (3.21 = 3.21%). You can ignore the rest of columns.

For the abundance estimate: SRF contig name, total length in bp, mean percent identity, filtered fraction and unfiltered fraction. If you specify -g (highly recommended!), you get the fraction of the whole genome. If you don't specify -g, you get the fraction of SatDNA.

I will keep this issue until I update README. Thanks!

baozg commented 1 year ago

Did the fraction have a unit or just the percentage of the whole genome? Take prefix#circ2-510 as an example, the fraction should be 0.37% or 37% if I use HiFi reads for estimation.