Closed HavokSahil closed 7 months ago
I am interested in addressing this Issue. After carefully analyzing the problem, I believe that utilizing a higher-order polynomial and incorporating regularization would be an effective approach to handle overfitting.
Additionally, I propose augmenting the dataset by incorporating more data through web scraping. This, in turn, would contribute to the robustness of the model and improve its performance.
@re4lvanshsingh I am enthusiastic about contributing to the project and would appreciate it if you could assign me this particular issue under Codepeak'23.
Sure thing. You've got some good insights and I was kind of planning to change the underlying ML Model as well.
I have assigned this issue to you. Here's what I want:
1) Using Web-Scraping tools like BeautifulSoup on python:
Extract the username, past 5 to 10 ratings, past 5 to 10 contest ranks, number of contests participated in and the number of accepted solutions (solved problems basically).
Prepare a comma separated values (csv) or excel file of the same.
2) Train various ML models on the divided dataset (70:30 ratio for training and testing) like Polynomial Regression, Neural Networks etc. and employ the highest performing model.
3) Using matplotlib plot the points for visualisation.
Right now, I have assigned the task of Web-Scraping for you as a medium task.
@HavokSahil please comment on the web-scraping issue to get it assigned. I will close this thread afterwards
@re4lvanshsingh I have sent the pull request. I have added new folder for Web-Scrapper. #6
Description
I have observed that the model predictions for Codechef ratings tend to overshoot, especially at lower Codeforces ratings, due to the use of a 3-degree polynomial. This behavior is impacting the accuracy of the predictions.
Expected Behavior
The model should however be consistent with lower as well as higher rating input within threshold value.