legumeinfo / azulejo

Tiling phylogenetic space with subtrees
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

finalize algorithm defaults #195

Open joelb123 opened 3 years ago

joelb123 commented 3 years ago

There are parameters that the user can choose for the algorithm:

  1. thorny or non-thorny
  2. peatmer vs. simple k-mer
  3. k-mer length
  4. strictly adjacent vs extended adjacency (i.e., 1st, 2nd, 3rd match of peatmer)

Like most scientific algorithms, the defaults should be a defensible choice obtained by comparison against a standard. There is expected tradeoff on completeness (fraction of genes in a match) vs colinearity (the parameter optimized by existing aligners such as DAGchainer). We are going to document how large this tradeoff is and justify a value, and we will emphasize collinearity where possible.

Proposed standard for matching is DAGchainer rather than minimap2, unless there is a reason otherwise.

Proposed data set is glycine7 rather than glycine33. If there is a choice for a second set to compare, it might be a good plan to choose a set that includes a genome with a large number of small scaffolds where the choices are likely sharper.

joelb123 commented 3 years ago

I missed a word in the title. This should be "synteny" algorithm defaults.