preseries / GASP

The General Accepted Startup Principles project
MIT License
17 stars 6 forks source link

How to quantify the description filed in the model? #4

Closed bearnxx closed 5 years ago

bearnxx commented 5 years ago

There are some string fields in the model, like product_description, product_keywords, business_solution, technical_solution and so on.

How can these features be Quantified in the predictive model?

xalperte commented 5 years ago

In our datasets we have different type of fields: numerical, categorical, and textual. The textual fields, such these ones appointed by you, are used in our models after some natural language processing tasks (translation into english, tokenization, remove stopwords, lemmatization, stemming, n-gramm identification, identification of entities, enrichment with synonyms, etc.) and transformed into term frequencies or word embeddings representations, such as word2vec, to use them in predictive modeling. They are especially useful when you try to do segmentation (by market, specialty, etc.), identify similar companies, or competitors, etc.

bearnxx commented 5 years ago

Very clear. Thanks 😁