veg / tn93

TN93 fast distance calculator
MIT License
15 stars 7 forks source link

100s in `hyphy` output format #20

Open ArtPoon opened 4 years ago

ArtPoon commented 4 years ago

Setting -f to hyphy outputs a matrix comprising mostly 100 entries. To reproduce, I retrieve NCBI PopSet 1892228972 and downloaded the FASTA file as sequence.fasta:

art@Wernstrom Downloads % mafft sequence.fasta > hiv.mafft.fa
[ omit output ]
art@Wernstrom Downloads % tn93 -o temp.csv hiv.mafft.fa 
Read 602 sequences of length 1094
Will perform 180901 pairwise distance calculations
Progress:     100% (    2210 links found,          inf evals/sec)
{
    "Actual comparisons performed" :180901,
    "Comparisons accounting for copy numbers " :180901,
    "Total comparisons possible" : 180901,
    "Links found" : 2210,
    "Maximum distance" : 0.159834,
    "Sequences" : 602,
    "Mean distance" : 0.0907582,
[ truncate output ]
art@Wernstrom Downloads % head -n5 temp.csv 
ID1,ID2,Distance
MT787751.1 HIV-1 isolate 16854 from Italy nonfunctional pol protein (pol) gene, partial sequence,MT787795.1 HIV-1 isolate 16786 from Italy pol protein (pol) gene, partial cds,0.00367472
MT787751.1 HIV-1 isolate 16854 from Italy nonfunctional pol protein (pol) gene, partial sequence,MT787869.1 HIV-1 isolate 17835 from Italy pol protein (pol) gene, partial cds,0.0139596
MT787751.1 HIV-1 isolate 16854 from Italy nonfunctional pol protein (pol) gene, partial sequence,MT787972.1 HIV-1 isolate 2498686 from Italy pol protein (pol) gene, partial cds,0.0130311
MT787751.1 HIV-1 isolate 16854 from Italy nonfunctional pol protein (pol) gene, partial sequence,MT788268.1 HIV-1 isolate 520925 from Italy pol protein (pol) gene, partial cds,0.012028
art@Wernstrom Downloads % tn93 -o temp.txt -f hyphy hiv.mafft.fa
[ omit output ]
art@Wernstrom Downloads % head -n3 temp.txt
{
{0,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100}
{100,0,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,0.00367472,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,0.0139596,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,0.0130311,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,0.012028,100,100,0.00552784,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100}

See also #16

ArtPoon commented 4 years ago

Ah, it's not a bug. Distances are only being stored in the matrix if they fall below the -t threshold, which defaults to 0.015. Setting -t1 reports out all distances. Unexpected behaviour though :-)

stevenweaver commented 4 years ago

Dear @ArtPoon,

This is a valid point. -t should either be required and/or explicitly stated in stdout as a report. Reopening.

Best, Steven

spond commented 4 years ago

Dear @ArtPoon and @stevenweaver,

What do you think the appropriate modification should be? One possibility is to report N/A or null in the matrix for distances that have not been computed.

Best, Sergei