Closed blanche closed 7 years ago
insert discussion on how we should go about this here
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html It already exists.
One thing I would argue we should implement is getting average error. Say, we calculated the MSE: it is 2890 for 20.000 test examples. We should have a function to calculate average error e.g: 2890/20.000 = xx. The motivation behind this is to compare our results with Kaggle results, I feel like this is the only way we can compare our results with Kaggle Leader Board.
Top score on Kaggle is 3464.55935, we know the size of validation set, we can find the average score: 3484/len(validation) and compare our results with this as a benchmark. ( Even if we will have less training set since we will be using part of training set as train )
its not about particularly calculating the MSE it self, but more about structuring the code so we only have to return the prediction and the rest is handled for us in a standardized way (same input data & same error calculation)
this helps to 1) get immediate feedback on any changes we do 2) make it esier to evaluate different models
semi related: isnt kaggle using mean average error or is that what you were saying?
Oh never mind, I was wrong on so many levels :smile: it is already averaged :rofl:
i think the way to go here is to implement an abstract class, representing a prediction model
each model will then just implements its prediction function and automatically get access to standardized input documents and error calculation
closed by #26
implement some basic framework of functions, which allows us to implement the prediction (not part of this task) and then calculates the mean squared error for you
this should also be used as a cleanup of the main.py function, which currently is just a concatenation of random functions
along the lines of: