jamessdixon / Kaggle.HomeDepot

Repo for Kaggle Competiton
MIT License
11 stars 10 forks source link

Implement RegEx in benchmarkScore.fsx #1

Closed jamessdixon closed 8 years ago

jamessdixon commented 8 years ago

line 53 does an exact match. Need to do fuzzy match to match results of the R poroject

tjaskula commented 8 years ago

I take it

taylorwood commented 8 years ago

@tjaskula sorry, I should've commented here saying I was working on this. Just submitted PR #10, but there's still work to do re: fuzzy matching if you want to work on that?

jamessdixon commented 8 years ago

Merged it in. I think fuzzy matching might be a good thing later on. Let me see what our Kaggle score is now

taylorwood commented 8 years ago

I'm not sure how it'll affect the score, but the previous implementation was a bit more lenient because it was doing a String.Contains. Before, angles would've been a match for angle, but now it'll only match angle = angle. The new version will probably exclude matches that should be valid matches. I think doing some simple stemming of the words before matching might increase the score.

Thinking ahead, I'm not sure if there are any typos in the data, but I guess we could deal with those using some string distance tolerance.

tjaskula commented 8 years ago

Taylor, not a problem. I didn't started to work on. I'll describe all other issues in Trello, thus changing the place of cards would indicate who's working on which part :)

jamessdixon commented 8 years ago

I am running it right now. Will let you know in a bit. Note that the F# implementation takes longer than the R one when applying wordMap to the train and test data...

taylorwood commented 8 years ago

@jamessdixon Yeah, it takes a while on my machine. I'll see what I can optimize to get the run time down. I suppose moving some of that code into a module and compiling it in release mode might help.

jamessdixon commented 8 years ago

I like how you changed productDescription function. Much cleaner than my code.