veg / hivtrace

MIT License
22 stars 15 forks source link

how to define the threshold? #88

Closed liamxg closed 2 months ago

liamxg commented 1 year ago

@stevenweaver @spond

stevenweaver commented 1 year ago

We have a new manuscript in the works for this very thing. Please stay tuned!

liamxg commented 1 year ago

similar to https://github.com/PoonLab/clustuneR? or different?

stevenweaver commented 1 year ago

It's undergoing some final revisions. It is a bit different.

spond commented 1 year ago

Dear @liamxg,

If you use hivnetworkcsv (a part of https://github.com/veg/hivclustering), which is a component of HIV-TRACE, you can use --threshold auto option.

For example, using the data which ships with hivclustering

$hivnetworkcsv -i examples/tn93-0.03.csv -f plain -t auto
....
ERROR : Could not automatically determine a distance threshold; no sufficiently strong outlier, best guess 0.01726 (score 1.07156)

In this particular case, the tool is suggesting the threshold of 0.01726, even though it's not confident enough to just use it (it will be in other cases, as the upcoming manuscript will detail).

So next, you could use

$hivnetworkcsv -i examples/tn93-0.03.csv -f plain -t 0.01726

Cheers, Sergei