loretoparisi / fasttext.js

FastText for Node.js
MIT License
192 stars 28 forks source link

calculate distance feature? #20

Closed crapthings closed 3 years ago

crapthings commented 5 years ago

maybe add an api to calculate vector distance

fasttext print-word-vectors trainresult.bin < queries.txt

住宅 -0.3543 -0.36086 -0.1972 -0.48346 -0.4279 0.084653 -0.74038 -0.77876 -0.69068 -0.42149 0.41304 0.9636 -0.11907 -0.081701 0.27681 -0.15278 -0.17322 -0.27368 -0.69611 0.42335 0.11701 -0.43995 0.1868 0.38824 0.42387 0.46397 0.38974 -0.59129 0.69363 0.26292 -0.36955 -0.27438 1.0732 0.0046569 -0.39709 0.44935 0.67039 -0.39564 -0.080179 0.0036072 -0.48187 -0.66577 0.27598 -0.54607 1.0294 -0.29769 0.52144 -0.044384 0.15926 -1.0104 0.80332 -0.60356 0.40641 -0.039965 0.41868 -0.0072699 0.069652 -0.12544 -0.30716 0.21804 -0.36222 -0.51133 -0.24029 -0.7333 0.26404 -0.30949 -0.17224 -0.52331 -1.1139 -0.26803 0.4566 0.28051 -0.50781 0.26043 0.11501 0.17622 -0.1344 -0.46 0.00035005 0.13337 0.50925 -0.82658 0.32135 -0.33323 0.75423 -0.60863 0.42117 0.35665 -0.17826 -0.82987 0.53353 -0.12717 -0.46963 0.15568 0.4642 -0.16868 -0.18377 0.65137 -0.0067536 1.4116

别墅 0.00094935 0.0073073 -0.00094808 -0.0010876 0.0012463 0.0014312 -0.0026107 0.0041731 0.0024454 -0.00093893 0.0045996 0.00050681 -0.00040101 0.0015428 0.0065499 -0.0007207 -0.0022505 -0.0046939 0.0039677 0.0047148 -0.0031379 0.0042863 -0.0056759 -0.0031934 0.0037867 0.006272 0.0050499 -0.0022674 0.0062237 0.00062629 0.0033722 -0.0027245 0.0016423 -0.0037467 -0.00014838 -0.0048198 0.0043823 0.002268 -0.00093589 -0.0034395 -0.0021894 0.0013966 -0.0010953 -0.00073448 0.0012601 0.00037782 -0.0012559 -0.00079777 0.0022461 -0.00085852 -0.001242 0.0039883 0.0017836 -0.00036524 -0.0013768 -0.0036831 0.0023176 0.0027225 0.0010305 0.0020299 0.00057907 -3.4135e-05 0.0029027 -0.00064469 7.3418e-05 -0.0051284 -0.0001829 -0.004983 -0.0024 -0.002313 -2.4026e-05 0.0068082 -0.0062092 0.0045259 -0.0023891 0.0015408 0.00077602 -0.0024638 0.0056508 0.0036942 -0.00089141 -0.0031128 -0.0040772 -0.00063497 -0.006542 -0.0016326 0.002223 -0.0040703 -3.8115e-05 -0.0020506 -0.003437 0.0037226 -0.0062743 0.00098213 0.00030893 0.0013302 -0.002533 0.0038249 -0.0050515 0.0025223

loretoparisi commented 5 years ago

@crapthings good idea, I can add some distance metrics.

bittlingmayer commented 5 years ago

py impl https://gist.github.com/bittlingmayer/b0025a97016cea9c3a18689ae1e7be3e

crapthings commented 5 years ago

js impl ?

https://github.com/compute-io/cosine-similarity

loretoparisi commented 5 years ago

Thanks guys, please wait implementing... ⚙️ I'm also adding dimensionality reductions with PCA and tSNE.

Adding some reference project I'm looking to - thanks @bittlingmayer ;)

Cosine-Similarity - https://github.com/compute-io/cosine-similarity TSNE https://github.com/karpathy/tsnejs TSNE https://github.com/scienceai/tsne-js PCA - https://github.com/mljs/pca, https://github.com/bitanath/pca Adding also (since I was not aware of)

TSNE for TF.js - https://github.com/tensorflow/tfjs-tsne

and also

Annoy - https://github.com/jimkang/annoy-node

About Cosine Distance and Cosine Similarity, keep an eye to this discussion with Gensim author, there are useful ways to normalize and to convert from the distance metrics.

Going to add a minimal implementation with strictly uses external references - if needed (I would prefer NO by the way to keep everything in this package). Stay tuned!