Open ntg24gr opened 3 years ago
The lackluster performance of the SVM model is due to the massive vaccination effort taking place in the world. As a result, the model will overestimate the rate of increase looking at pre-vaccination covid data. The reason I chose a larger training data set is to allow the models to learn from the effects of the vaccines.
The SVM model performed poorly because the hyperparameters were optimized for pre-covid data, while other models are optimized for the current data.
I may make changes in the future to address this, maybe even adding more localized predictions.
As you can see, setting the testing set to 15% cannot fix the problem with the model. I think it has more to do with the hyperparameters with the SVM model and the nature of the data.
I think having a multi-variable covid prediction model would result in a more accurate results than one with one variable.
Hello, great work! I am trying to learn through your code... I have a question regarding your training sample, why you used only 5%. What I know is that it is normally 80:20, for training:testing set.
X_train_confirmed, X_test_confirmed, y_train_confirmed, y_test_confirmed = train_test_split(days_since_1_22[50:], world_cases[50:], test_size=0.05, shuffle=False)
In addition, from the beginning of the year the prediction of SVM is failing to predict well, while it was super before. What do you think is the reason? Overfitting? Thank you