soton-data-mining / job-salary-prediction

A regression problem, predicting salaries of jobs in UK based on various criteria
8 stars 3 forks source link

implement kNN search for tf.idf similarity #33

Open blanche opened 7 years ago

blanche commented 7 years ago

should be faster than calculating the entire matrix

see #32

(creating this issues for documentation purposes)

utkuozbulak commented 7 years ago

I checked the PR, I cant find a way to incorporate this into feature set and feed to a ML algo, also the feature set is already huge as it is ( 86 features to be precise )

Is there a way we can use description as a part of EDA and merge all the following issues ? What will we do ?

24

32

30

5

22

blanche commented 7 years ago

cant we just work with the similarity matrix as weights? or use the knn output and add it to the other prediction? or combine it like multiple NN or boosting or idk

its just that this similarity thing is kind of powerful - the standalone model, not using any other features, had an error ~9k (iirc) compared to the lin reg model with ~10k (?)

yea sure we can extract some features from the description, but i doubt it would be as good, although i see that we have to do it if there is no other option (but the weighted combination HAS to work.. - just learn weights on how to combine the different predictions(?))

utkuozbulak commented 7 years ago

I'll look at it again, (I'm terrible at this text stuff <.< )