Closed aguzev closed 3 months ago
Any suggestion about how to improve the text?
I'm not a good English writer. Sorry for wasting time on basic math.
The sentence about cosine similarity seems ok to me, there's only a typo in the Euclidean part at the end. This change should be sufficient:
Cosine similarity is particularly useful when working with high-dimensional data such as word embeddings because it takes into account both the magnitude and direction of each vector. This makes it more robust than other measures like Euclidean distance, which only considers the ~magnitude~ direction.
Given two n-dimensional points A and B
Cosine similarity:
Where:
A dot B is the dot product of the vectors A and B
|A| is the magnitude of vector A
|B| is the magnitude of vector B
so cosine similarity does take into account the magnitude, as mentioned.
Euclidean distance:
so Euclidean distance considers only the direction, not the magnitude -- this is the part to fix.
Docs updated
I'm sorry for wasting your time on basic math but it is important.
In a Euclidean space $A·B = |A||B| cosβ$ $\frac{A·B}{|A||B|} = \frac{|A||B| cosβ}{|A||B|} = cosβ$ In fact, $\frac{A·B}{|A||B|}=cosβ$ is a definition of angle β between vectors.
Given five vectors $A=(0,3), B=(0,2), C=(0,1), D=(2,0), E=(\sqrt{2},\sqrt{2})$
Magnitudes are $|A| = 3, |B| = 2, |C| = 1, |D| = 2, |E| = 2$
Euclidean distances ρ are $ρ(A,B) = |\sqrt{(0-0)²+(3-2)²}| = 1$ $ρ(B,C) = |\sqrt{(0-0)²+(2-1)²}| = 1$ $ρ(A,C) = |\sqrt{(0-0)²+(3-1)²}| = 2$ $ρ(A,D) = |\sqrt{(0-2)²+(3-0)²}| = \sqrt{13}$ $ρ(B,D) = |\sqrt{(0-2)²+(2-0)²}| = \sqrt{8}$ $ρ(C,D) = |\sqrt{(0-2)²+(1-0)²}| = \sqrt{5}$ $ρ(A,E) = |\sqrt{(0-\sqrt{2})²+(3-\sqrt{2})²}| = 2.12$ $ρ(B,E) = |\sqrt{(0-\sqrt{2})²+(2-\sqrt{2})²}| = 1.53$ $ρ(C,E) = |\sqrt{(0-\sqrt{2})²+(1-\sqrt{2})²}| = 1.47$ $ρ(D,E) = |\sqrt{(2-\sqrt{2})²+(0-\sqrt{2})²}| = 1.53$
Cosine similarities σ are $σ(A,B) = \frac{0×0+3×2}{3×2} = \frac{6}{6} = 1 = cos0$ $σ(B,C) = \frac{0×0+2×1}{2×1} = \frac{2}{2} = 1 = cos0$ $σ(A,C) = \frac{0×0+3×1}{3×1} = \frac{3}{3} = 1 = cos0$ $σ(A,D) = \frac{0×2+3×0}{3×2} = \frac{0}{6} = 0 = cos{\frac{π}{2}}$ $σ(B,D) = \frac{0×2+2×0}{2×2} = \frac{0}{4} = 0 = cos{\frac{π}{2}}$ $σ(C,D) = \frac{0×2+1×0}{1×2} = \frac{0}{2} = 0 = cos{\frac{π}{2}}$ $σ(A,E) = \frac{0×\sqrt{2}+3×\sqrt{2}}{3×2} = \frac{3\sqrt{2}}{6} = \frac{1}{\sqrt{2}} = cos{\frac{π}{4}}$ $σ(B,E) = \frac{0×\sqrt{2}+2×\sqrt{2}}{2×2} = \frac{2\sqrt{2}}{4} = \frac{1}{\sqrt{2}} = cos{\frac{π}{4}}$ $σ(C,E) = \frac{0×\sqrt{2}+1×\sqrt{2}}{1×2} = \frac{\sqrt{2}}{2} = \frac{1}{\sqrt{2}} = cos{\frac{π}{4}}$ $σ(D,E) = \frac{2×\sqrt{2}+0×\sqrt{2}}{2×2} = \frac{2\sqrt{2}}{4} = \frac{1}{\sqrt{2}} = cos{\frac{π}{4}}$
A,B, and C have same direction and different magnitudes. Euclidean distances are different. Cosine similarities equal. Euclidean distance takes into account both direction and magnitude.
Context / Scenario
Read the document Cosine Similarity.
What happened?
The document Cosine Similarity contains the following text:
The distances of both sentences from truth are huge enough to rewrite them.
Importance
a fix would make my life easier
Platform, Language, Versions
KM Version 0.62.
Relevant log output
No response