robertisandor / USF-MSDS621-KaggleHomeDepot

A machine learning project to predict the relevancy of products returned for given search terms on the Home Depot website
0 stars 2 forks source link

New features #5

Closed jwang917 closed 5 years ago

jwang917 commented 5 years ago

Clean up the description, query, attributes by separating connected words and some other issues.

Add new similarity features:

  1. Jaro distance.
  2. Normalized compression distance.
  3. Smith-Waterman algorithm to compute similarity.

If time allowed, I will run PCA on our features and try on using SVM to do regression.

Divide all features into three groups. Run SVM, XGBoost on similarity features. Plot model performances on feature space. Add PCA before running on all numerical features. Investigate NCD, Jaro, Smith-waterman algorithm, and write them in plain English.

jwang917 commented 5 years ago

Done with Jaro score computing.

jwang917 commented 5 years ago

Done with Smith-Waterman similarity score

jwang917 commented 5 years ago

Done with Normalized Compression Distance.

jwang917 commented 5 years ago

Gonna do the PCA and SVM regression after afternoon class. And make a small data exploration on the features done in the last weekend

jwang917 commented 5 years ago

Done with the dividing of features. Working on getting result for svm regression.

robertisandor commented 5 years ago

Features have been tested and integrated into the code. Thanks for your help!