martinapugliese / tales-science-data

WORK UNDER RESTRUCTURING
41 stars 10 forks source link

Fasttext #215

Open martinapugliese opened 3 years ago

martinapugliese commented 3 years ago

From the notebook I had on this

"FastText is an extension to Word2Vec proposed by Facebook in 2016. Instead of feeding individual words into the Neural Network, FastText breaks words into several n-grams (sub-words). For instance, the tri-grams for the word apple is app, ppl, and ple (ignoring the starting and ending of boundaries of words). The word embedding vector for apple will be the sum of all these n-grams. After training the Neural Network, we will have word embeddings for all the n-grams given the training dataset. Rare words can now be properly represented since it is highly likely that some of their n-grams also appears in other words. I will show you how to use FastText with Gensim in the following section." from https://towardsdatascience.com/word-embedding-with-word2vec-and-fasttext-a209c1d3e12c

FastText is a word embeddings model realised by Facebook (2016) which unlike the Word2Vec suite uses portions of words instead of whole ones, extracting the vectors for these portions and then summing them up to build the word vector. For instance, if the word is ""