saladtheory / saladtheory.github.io

Salad Theory
Apache License 2.0
119 stars 9 forks source link

Suggestions for Ingredient Entropy #31

Open wrongbad opened 5 months ago

wrongbad commented 5 months ago

I first thought about opening a PR, but figured it could be discussed first. In particular I wonder if we want to simply replace the old entropy equation, or include a new section to compare different entropy metrics. (Assuming this idea is even accepted).

I think we can look at Shannon Entropy for inspiration here. In particular, it's widely used to analyze the entropy of word salad in large language model training.

Another bonus of Shannon entropy with log-base-2 is that it exactly represents the number of bits needed to encode which ingredients you ate in which order, assuming you know the ingredient mixture ratio already.

For ingredients, we would use the proportion of the mixture as the probability. Compared to the previous formula including only total types and total elements, this new formula captures some intuitive cases of low entropy mixtures.

Example 1

1 peanut, 1 cashew, and 98 almonds in a bowl

Old: log2(100) * 3 = 19.93 New: - [ log2(1/100) * 1 + log2(1/100) * 1 + log2(98/100) * 98 ] = 16.14

Example 2

33 peanuts, 33 cashews, and 34 almonds

Old: log2(100) * 3 = 19.93 New: - [ log2(33/100) * 33 + log2(33/100) * 33 + log2(34/100) * 34 ] = 158.48

The increased entropy makes sense here, because it measures how unpredictable the mixture is.

Example 3

1 ice cube

1 ice cube new -log2(1/1) * 1 = 0 old log2(1) * 1 = 0

Example 4

8 ice cubes

8 ice cubes new -log2(1/1) * 8 = 0 old log2(8) * 1 = 3

I think this solves @fragilechin's concern in #18. More of a 1-ingredient mixture does not increase entropy.

Other Options

Normalized Entropy

Divide the Shannon Entropy by the total ingredient elements, to get entropy per element, or "average bits per element" of the mixture.

Perplexity

Perplexity is another common metric used in analyzing the statistics of word salad. It's essentially 2 ^ (Shannon Entropy). This represents roughly the number of unique ways (weighted by their probability) you might end up consuming the ingredient elements.

Normalied Perplexity

Like entropy, computing the Perplexity per ingredient element provides a different and interesting insight. This time it represents the statistically weighted variety of ingredients that might be the next ingredient element you eat. This may sound abstract, so lets revisit the nuts example:

Example 1: 1 peanut, 1 cashew, and 98 almonds in a bowl

norm-perplexity = 2 ^ -[ log2(1/100)*1/100 + log2(1/100)*1/100 + log2(98/100)*98/100 ] = 1.12

Because your expectation of the next nut is heavily biased toward a single kind of nut, the perplexity is barely above 1

Example 2: 33 peanuts, 33 cashews, and 34 almonds

norm-perplexity = 2 ^ -[ log2(33/100)*33/100 + log2(33/100)*33/100 + log2(34/100)*34/100 ] = 2.9997

In this case all 3 options are almost equally likely, so the normalized perplexity represents a fair 3 way unpredictability.

Cloving Thoughts and Onions

I'm personally more in favor of normalized entropy metrics as they tell you more directly about the mixture and distribution (intrinsic properties of the food recipe), without being coupled to the quantity of the food. One could always multiply by the number of ingredient elements to achieve quantity dependent absolute entropy if desired.

I would also probably recommend Normalized Perplexity because it moves back into the linear domain and correlates more intuitively to the number of unique ingredients, so people can more intuitively compare salad recipes with this metric.

Finally, I have a proposal for @FragileChin's request for a metric that normalizes in the 0-1 range. You could for example use 1 - 1 / NormalizedPerplexity. For salads with 1 unique ingredient, this becomes 0. For an equal mix of 10 ingredients, it becomes 0.9. For an equal mix of 100 ingredients, it becomes 0.99. A true 1.0 salad would require infinite ingredients, but I see no finite alternative because adding more ingredients should always increase salad entropy.

P.S.

Drawing more parallels to word salad is a content-rich environment, I hope more people are inspired.

azinoveva commented 4 months ago

I like the Shannon entropy approach because:

Subsequently, this also holds for perplexity or normalized metrics.