Summary:
This project is about predicting whether a patient has COVID-19 or not. It uses data from the Israeli Ministry of Heath. In the report, the team presents an EDA for the number of positive cases of COVID-19 over time as well as tables describing the proportion of positive tests for a given characteristic or symptom. The team also displayed progress by showing results from a logistic regression and a K-Nearest Neighbors model.
Things I like:
I liked the descriptions used for the data cleaning process and getting to know the process in which you are understanding information. I originally thought that the dataset was completely in English, and I appreciate the clarification that the data needed to be translated.
I also like the inclusion of the two different tables, which gives a good insight as to what is happening rather than just reading a block of text about it.
The future work seems very straightforward and possible to get done by the end of the semester. The future work section is nicely detailed and lets us know exactly what we can be expecting in the next report.
Areas for Improvement:
I feel like what you have written in the abstract is somewhat unnecessary. What is written in the abstract is a good thing to verbally say at the start of a presentation to a manager to go over what you've done before going into the details but not necessary for a written report since they will know this just by looking at the section titles.
I feel like you should specify where the individuals are from since it is a bit ambiguous when you describe the dataset.
Asides from the runtime and close accuracy to the logistic regression, there are other reasons why a k-NN model is inappropriate. Another reason why it can be inappropriate is when you look at the tables you have included. If there are not somewhat clear separations of how groups can be classified then a k-NN model is not very useful, since you have already shown through the tables that there is not a very clear indication. It would also be worth it to look at the false-negative rate and compare it to the logistic regression's before dismissing the model.
Summary: This project is about predicting whether a patient has COVID-19 or not. It uses data from the Israeli Ministry of Heath. In the report, the team presents an EDA for the number of positive cases of COVID-19 over time as well as tables describing the proportion of positive tests for a given characteristic or symptom. The team also displayed progress by showing results from a logistic regression and a K-Nearest Neighbors model.
Things I like:
Areas for Improvement: