nleroy917 / optipyzer

Multi-Species Codon Optimization Engine
https://optipyzer.com
Apache License 2.0
23 stars 5 forks source link

Web interface is missing full description of sequence optimization results #56

Closed nleroy917 closed 9 months ago

nleroy917 commented 9 months ago

Problem

The current modal for displaying more information about the sequence optimization results appears cutoff. It needs to be completed and make a clear distinction between optimized_ad and optimized_sd.

Optimized SD v Optimized AD

They stand for "squared difference" and "absolute difference", respectively. From the code: AD:

iterates upon the multi_table while optimizing the query to select the best-optimized DNA sequence using an
    absolute-difference based method

SD:

iterates upon the multi_table while optimizing the query to select the best-optimized DNA sequence using a sum of
    squares of differences based method

Basically, the difference is how the standard error is computed when adjusting the codon frequency table for multiple species. There are two ways that we can do this: 1) squared error, and 2) absolute error. Their respective formulas look something like this:

$$ \text{SD} = \left( y - \hat{y} \right)^2 $$

$$ \text{AD} = | y - \hat{y} | $$

You can see that both produce positive values, but there seems to be debate about which is better. I didn't actually write this original code, however, so I am reluctant to make a definitive choice about which is better. Here is a stack exchange about which might be more appropriate. Notably:

The benefits of squaring include:

  • Squaring always gives a non-negative value, so the sum will always be zero or higher.
  • Squaring emphasizes larger differences, a feature that turns out to be both good and bad (think of the effect outliers have).

Therefore, it seems that squaring might be favorable and perhaps we can do away with the optimized_ad altogether. At least on the UI to make it clean and simple.