soton-data-mining / job-salary-prediction

A regression problem, predicting salaries of jobs in UK based on various criteria
8 stars 3 forks source link

implement evaluation functionality #25

Closed blanche closed 7 years ago

blanche commented 7 years ago

implement some basic framework of functions, which allows us to implement the prediction (not part of this task) and then calculates the mean squared error for you

this should also be used as a cleanup of the main.py function, which currently is just a concatenation of random functions

along the lines of:

result = predict_salary(foo) # so we later only have to focus of the contents of this function
calculate_error(result)
blanche commented 7 years ago

insert discussion on how we should go about this here

utkuozbulak commented 7 years ago

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html It already exists.

One thing I would argue we should implement is getting average error. Say, we calculated the MSE: it is 2890 for 20.000 test examples. We should have a function to calculate average error e.g: 2890/20.000 = xx. The motivation behind this is to compare our results with Kaggle results, I feel like this is the only way we can compare our results with Kaggle Leader Board.

Top score on Kaggle is 3464.55935, we know the size of validation set, we can find the average score: 3484/len(validation) and compare our results with this as a benchmark. ( Even if we will have less training set since we will be using part of training set as train )

blanche commented 7 years ago

its not about particularly calculating the MSE it self, but more about structuring the code so we only have to return the prediction and the rest is handled for us in a standardized way (same input data & same error calculation)

this helps to 1) get immediate feedback on any changes we do 2) make it esier to evaluate different models

semi related: isnt kaggle using mean average error or is that what you were saying?

utkuozbulak commented 7 years ago

Oh never mind, I was wrong on so many levels :smile: it is already averaged :rofl:

blanche commented 7 years ago

i think the way to go here is to implement an abstract class, representing a prediction model

each model will then just implements its prediction function and automatically get access to standardized input documents and error calculation

blanche commented 7 years ago

closed by #26