Open matthew-brett opened 11 months ago
Data: We currently have all the E.U. data for vaccines by type and doses, and the excess mortality data for all E.U. countries based on a 2016-2019 baseline. We also have Russia's death and vaccination data as a potential comparison.
We cleaned the E.U. vaccine data and created a base_vaccine dataset in the cleaning_vaccine.py which can be used as a baseline for future data analysis or further cleaning/re-organization. We also cleaned the E.U. death data and groupby'd them in terms of means.
Initial strategy: Our strategy is to regress the percentage of first dose vaccinated people by the overall 2023 mean excess mortality rates per country. To see if we have an initial relationship. Then we plan to do a multiple regression by adding the types of vaccines to see if the vaccine brand/technology had any impact. We will then try to control for things like age and health services.
What we've tried: After the data cleanup, we've setup the initial linear regression but ended up finding slight discrepancies in terms of indexes that we will resolve tomorrow and attempt the first linear regression. After that we will be able to know how to proceed.
What worked and what did not: The vaccines dataset ended up having issues in terms of categories where the regions and the "all" categories had duplicates. Countries like Finland, Italy and Lithuania caused us some issues but eventually we managed to tailor the code to limit the region to the overall country label and then used drop_duplicates to get numbers that mostly match their own declared numbers online.
What you plan to do next: Follow through with our strategy and adapt along the way based on the hurdles we encounter.
As promised (on Friday) - here's your issue asking for a progress checkin.
Could you reply to this issue post with a brief bullet-point summary of a couple paragraphs, letting me know where you are? A really useful summary will: