stat231-s23 / blog1-neuropeeps

Alex Yan & Lily Wickland Shearer
0 stars 0 forks source link

Status Update #1 #2

Open alexandyan opened 1 year ago

alexandyan commented 1 year ago

At this point, we have wrangled one dataset to include unsupervised learning, the 2020 annual COVID deaths and freedom to make life choices score dataset. We also generated an elbow plot and scatterplot with clustering to visualize this wrangling, which we want to put in our final blog. This took a much longer than we expected, as we realized that we have to split the wrangling by year, so for each scatterplot, there are 3 datasets that we have to wrangle. Thus, we are a little behind on our original timeline. So, by the next status update, we plan to finish wrangling the clustering datasets as well as choosing a few countries to analyze further with text analysis.

katcorr commented 1 year ago

Ok! Update 1: 5/5

Lwicklandshearer commented 1 year ago

At this point, we have fully finished our wrangling for the unsupervised learning section for every year (2021 and 2022 in addition to the 2020 dataset wrangled before) and creating elbow and scatterplots to visualize clusters for these years. We have also figured out how to view which countries belong to which clusters across years and are ready to begin explaining our findings. For our next step, we are trying to find four countries that do not overlap in any cluster that we are interested in examining for our text analysis. We are currently working on finding sources from which to scrape data for four different countries but ran into the roadblocks of having to have permission to look at the health policies of each of these countries and figure out how to cross language barriers. If this doesn't work we have decided on a backup plan: Wikipedia has a COVID-19 page for most countries, and we could scrape the information from the government policy sections from these sites. We are still a bit behind on our timeline because of this forced consideration of alternate sources for our text scraping and having to pick countries from our clustered groups (there is surprisingly a lot of overlap from year to year). We hope to have picked countries by tomorrow night, finish all of our text wrangling over the weekend, and polish our blog draft before presentations on Tuesday.

katcorr commented 1 year ago

OK! Note that the presentation/feedback session isn't until Thursday so you have a bit more time that you thought if you were thinking Tuesday. In terms of permissions to look at the health policies of each country, do you mean you cannot view the website without permissions? Or it just doesn't give permission to scrape? I have an idea if it's the latter.

Update 2: 5/5

Lwicklandshearer commented 1 year ago

Awesome, thank you! We're having trouble finding other country equivalents to the US CDC for countries like China and Russia and our first thought was that we might not have permission to access those sites, then it also occurred to us that we are typing in English into the Google Search bar, and are therefore could likely be getting responses in English only and both those websites may be written in other languages. We do have access (and scraping access) to both US embassy guides to regulations in other countries and the Wikipedia pages mentioned before. Do you think those would be okay substitutes?

katcorr commented 1 year ago

Oh, I see! Yes, for the purposes of this project, that would definitely be a fine substitute!