The team used covid 19 data released by the Israeli Ministry of Health to make predictions that given a patient's indicated symptom, whether the patient is infected or not using a classifier. In EDA, they explored factors that are influential for patient's testing to be positive. They did logistic regression and KNN for preliminary analysis.
Three things I like
Figure 1's interpretation is very clear and meaningful that they plotted out the wave of increasing in positive testing cases for covid 19. Very inspiring to see this. But also curious about how this observation could be used in the classification problem that they mentioned in their future work.
I like the two graphs they used in the modelling section to explain their observation in the training process. Very clear for readers to understand how the model is currently performing.
Like the discussion and thoughts they mentioned in the logistic regression model part about how changing the threshold might influence the false-negative rate and the trade-off between FNR and FPS.
Three areas for improvement
For the EDA, it is said the relationship between the proportion of positive cases and the other characteristics is interesting, however, the authors didn't give a more detailed explanation about why this is interesting and how this affects their possible feature engineering or model selection. It might take some time for readers who are not familiar with the project and the dataset to understand why it's interesting.
The team didn't talk much about the data cleaning part. More specifically, the Nan values. Could talk more about which columns are seen as null, what's the possible reason behind and any indication for that. Directly dropping the rows may lose some helpful indication in the raw data.
Wanna know about the ratio of positive and negative test cases and whether the ratio is relative 0.5 to 0.5. One possible scenario may be that for most people who go for testing is because they have some specific symptoms or combination of symptoms, thus the positive examples might be much more than the negative examples. This might cause bias in the training and may need to consider how to balance the dataset.
Summary
The team used covid 19 data released by the Israeli Ministry of Health to make predictions that given a patient's indicated symptom, whether the patient is infected or not using a classifier. In EDA, they explored factors that are influential for patient's testing to be positive. They did logistic regression and KNN for preliminary analysis.
Three things I like
Three areas for improvement