Closed jamessdixon closed 8 years ago
I take it
@tjaskula sorry, I should've commented here saying I was working on this. Just submitted PR #10, but there's still work to do re: fuzzy matching if you want to work on that?
Merged it in. I think fuzzy matching might be a good thing later on. Let me see what our Kaggle score is now
I'm not sure how it'll affect the score, but the previous implementation was a bit more lenient because it was doing a String.Contains
. Before, angles
would've been a match for angle
, but now it'll only match angle
= angle
. The new version will probably exclude matches that should be valid matches. I think doing some simple stemming of the words before matching might increase the score.
Thinking ahead, I'm not sure if there are any typos in the data, but I guess we could deal with those using some string distance tolerance.
Taylor, not a problem. I didn't started to work on. I'll describe all other issues in Trello, thus changing the place of cards would indicate who's working on which part :)
I am running it right now. Will let you know in a bit. Note that the F# implementation takes longer than the R one when applying wordMap to the train and test data...
@jamessdixon Yeah, it takes a while on my machine. I'll see what I can optimize to get the run time down. I suppose moving some of that code into a module and compiling it in release mode might help.
I like how you changed productDescription function. Much cleaner than my code.
line 53 does an exact match. Need to do fuzzy match to match results of the R poroject