codon_usage.csv How does the data in come from

Hi @kuainaiyang, this is a good question. You're definitely correct those two numbers are different... They probably don't match because we are using two different source of information for codon usage stats. That database you linked looks like it contains data from 2007. Our codon usage tables come a 2019 paper. In addition, those are counts, but what really matters is the ratio of codons for each residue in a protein structure. So the total count shouldn't matter as long as the distribution remains nearly the same.

Moreover, you're looking at Glycine here with GGG, and if you crunch the numbers for Glycine as you've pointed out:

669768 / (669768 + 669873 + 903565 + 437126)

You'll get a preference of ~24.9%. Using the numbers from our database, you'll get:

1204747 / (1204747 + 1341018 + 1558367 + 846367 + 4950499)

which is ~24.3%. So, pretty darn close...

So, I think it's just different sources of codon usage, but that fact that they are so similar gives me confidence that they are both reasonably correct.

nleroy917 / optipyzer

codon_usage.csv How does the data in come from #63