Closed jwang917 closed 5 years ago
Done with Jaro score computing.
Done with Smith-Waterman similarity score
Done with Normalized Compression Distance.
Gonna do the PCA and SVM regression after afternoon class. And make a small data exploration on the features done in the last weekend
Done with the dividing of features. Working on getting result for svm regression.
Features have been tested and integrated into the code. Thanks for your help!
Clean up the description, query, attributes by separating connected words and some other issues.
Add new similarity features:
If time allowed, I will run PCA on our features and try on using SVM to do regression.
Divide all features into three groups. Run SVM, XGBoost on similarity features. Plot model performances on feature space. Add PCA before running on all numerical features. Investigate NCD, Jaro, Smith-waterman algorithm, and write them in plain English.