stat231-f20 / Blog-HealthAndJusticeLeague

Repository for PUG Blog Project – Health and Justice League
https://stat231-f20.github.io/Blog-HealthAndJusticeLeague
0 stars 0 forks source link

Update 1: Plan #1

Open LillianYKim opened 3 years ago

LillianYKim commented 3 years ago

SEE THE REVISED PLAN BELOW INITIAL PLAN

I. Direction

We want to switch gears from our mid-semester shiny project and now focus on environmental issues. Environmental pollution and climate disasters are closely related to both public health and social justice. Pollution is one of the risk factors for diseases (e.g. respiratory diseases) that disproportionately affect population of lower socioeconomic classes. Climate disasters can cause ripple effects such as the displacement of peoples, economic loss, homelessness, etc. We will focus on 4 research questions listed below:

  1. Does climate change disproportionately, if at all, affect global South than global North?

  2. Are the effects of climate change (specifically air pollution) evident in epidemiological prevalence data or mortality data?

  3. How is the occurrence of climate-related events (e.g. natural disasters) related to the public health of communities?

-More specifically, how has the frequency of natural disasters changed throughout the past couple of decades? What do the numbers for homelessness and displacement due to natural disasters look like and how have those numbers changed over the years?

  1. Has air quality/CO2 emission improved since pandemic?

**Not a dataset, but images we could include via a link on the first page of blog to show climate change effects! https://climate.nasa.gov/images-of-change?id=739#739-spring-in-the-kulunda-steppe-russia-kazakhstan-border

The following are datasets we found on the internet. This is only a tentative list of datasets we plan to potentially use, and is subject to change.

  1. Data on infant mortality as relevant to socioeconomic class/region of the world https://academic.oup.com/reep/article/12/1/26/4835833

    would have to scrape tables (check there's permission, this is a published research article)

  2. Many graphs/data available on climate disasters and associated effects https://ourworldindata.org/natural-disasters

    has links to diagrams where you can download data used in the figure relevant: direct disaster economic loss, global damage costs from natural disasters, internally displaced persons from natural disasters, death rates from natural disasters (will choose SOME, NOT ALL of these options) caveat: figures created on their website might be similar to ones we plan to do

  3. Internal Displacement due to Climate Disasters Worldwide https://www.internal-displacement.org/database/displacement-data

    downloadable excel file need to make sense of new displacements vs. total IDPs (numbers don't make sense as of now)

  4. Data on Economic Costs and Deaths due to Specific Climate Disasters (1980 - 2020) https://www.ncdc.noaa.gov/billions/summary-stats

    will need to scrape tables necessary lots of tables so maybe choose a couple of climate disasters to focus on over the 1980-2020 time period

  5. Global and "regional" sea level data measured by multiple satellite altimeter oceanography mission systems (1992 - 2020)

    global: https://www.star.nesdis.noaa.gov/socd/lsa/SeaLevelRise/LSA_SLR_timeseries_global.php regional: https://www.star.nesdis.noaa.gov/socd/lsa/SeaLevelRise/LSA_SLR_timeseries_regional.php these datasets are provided in .csv format

  1. Fossil fuel CO2 emission by nation (1751-2014) https://datahub.io/core/co2-fossil-by-nation available as .csv (the website also provides a code to directly import data into R)

1-1. Some interesting datasets on CO2 emission by country, global temperature change, etc. -- there are datasets that we could -potentially use (perhaps a third "variable" since we only got 2 from our initial discussion); we could also write some introduction stuff based on the information provided here: https://ourworldindata.org/co2-and-other-greenhouse-gas-emissions

  1. PM10 and PM2.5 air quality data (2010-2016) https://www.who.int/airpollution/ambient/en/ available as .xlsx on the website

  2. Comparing pre and post covid https://aqicn.org/data-platform/covid19/ -provides 2019 Q1, Q2, Q3, Q4 and 2018, 2017, 2016, 2015 H1 data

  1. Welfare cost from air pollution and PM2.5 by country (~2017) https://stats.oecd.org/index.aspx?queryid=72722 most extensive, can be downloaded from the website in .csv file with variables of choice https://stats.oecd.org/Index.aspx?DatasetCode=HEALTH_STAT -can also export health statistics (no specific data on respiratory diseases though, just mortality and morbidity) https://academic.oup.com/reep/article/12/1/26/4835833 -some data on infant mortality in different economic groups
  1. Heatwaves http://blogs.edf.org/climate411/2019/01/24/heatwaves-to-become-more-deadly-and-increase-global-inequality/
    • could look at how temperature rise disproportionately impact different regions in the world; some specific variables we could consider are perhaps a) number of deaths due to extreme heat, b) number of extreme heat days in a year

      didn't get to actually find data on this yet (discuss on this tgt and decide whether this would work or if we already have enough to work with)


II. Describe your intended final products (this is a rough list and is subject to change)

choropleth of world (Bella, Lillian, Mythili) For this product, we are considering using leaflet, only if it does not overcrowd the map with too much information. Otherwise, this could also be a part of shiny interactive app, in which case the user will be able to select a certain time point (e.g. 2010) and examine how the countries around the world compare in terms of air quality//the extent to which they've suffered from environmental pollution (e.g. the user may choose to compare Beijing in 2000 to Beijing in 2019, or Beijing in 2020 to Paris in 2020)

shiny interactive app (Together) For this product, we plan to allow the user to choose certain environmental variables, public health variables, and regions.


III. Schedule

We plan to have zoom meeting on every Tuesday, and synchronously/asynchronously communicate with FB message or github issue on every Thursday to update one another on what progress each of us made. Of course, given the workload and the time given for this project, we expect to communicate outside of regular check-points.

LillianYKim commented 3 years ago

Comment any ideas/resources below!

Idea 1: Further exploring covid stuff

County-level Socioeconomic Data for Predictive Modeling of Epidemiological Effects https://github.com/JieYingWu/COVID-19_US_County-level_Summaries

Covid Severity Forecasting https://github.com/Yu-Group/covid19-severity-prediction

Real time covid dataset https://www.nature.com/articles/s41597-020-0448-0

Johns Hopkins dataset https://github.com/QFL2020/COVID_DataHub

Idea 2: Reproductive rights

Reproductive health/rights data https://www.cdc.gov/reproductivehealth/data_stats/index.htm

Abortion statistics in UK during pandemic

Can create datasets with all sorts of reproductive right indicators on state-/county-level https://data.guttmacher.org/counties

Information on reproductive rights and accessibility to abortion in US states https://statusofwomendata.org/explore-the-data/reproductive-rights/

Idea 3: Crime Rate Changes During Pandemic

https://cdn.ymaws.com/counciloncj.org/resource/collection/D26974EF-0F75-4BDE-ADE7-86DA0741DC49/Impact_Report_-_Crime.pdf

-caveat: might have to specifically look up states' crime rate info

mysubb commented 3 years ago

ncbi article on environmental impacts in 2020 vs. previous years https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7459942/

katcorr commented 3 years ago

Great topic idea, and motivation! You've identified some excellent questions to explore, and lots of potential data sources -- I fear this may be too ambitious to try to do it all within the time we have, so I would recommend narrowing your focus. Your planned visualizations make sense, but if they all require different data sources, it may be more appropriate to try to focus on one or two of the data sources you mention.

Love the checklist as a way to plan your schedule! Which you can also update and re-adjust as needed going forward.

Excellent planning, team! I really look forward to your blog post.

Update 1: 10/10

LillianYKim commented 3 years ago

REVISED PLAN

I. Direction

We will focus on 3 research questions that are listed below along with their associated datasets:

  1. Are the effects of climate change (specifically air pollution) evident in epidemiological prevalence data or mortality data?
  1. How has the frequency of natural disasters changed throughout the past couple of decades? + What do the numbers for homelessness and displacement due to natural disasters look like and how have those numbers changed over the years? (How is the occurrence of climate-related events (e.g. natural disasters) related to the public health of communities?)
  1. How do climate change ripple effects show differently in developing (i.e. more agriculture-based) versus developed countries? Specifically, how is infant mortality and maternal mortality affected by the climate change ripple effects? https://academic.oup.com/reep/article/12/1/26/4835833

Other Resources

  1. **Not a dataset, but images we could include via a link on the first page of blog to show climate change effects! https://climate.nasa.gov/images-of-change?id=739#739-spring-in-the-kulunda-steppe-russia-kazakhstan-border

  2. Info for blog text later

    how climate change disproportionately affects certain areas agriculturally and how that relates to infant mortality https://academic.oup.com/reep/article/12/1/26/4835833


II. Describe your intended final products (this is a rough list and is subject to change)

shiny interactive app (Together) For this product, we plan to allow the user to choose certain environmental variables, public health variables, and regions.


III. Schedule

We plan to have zoom meeting on every Tuesday, and synchronously/asynchronously communicate with FB message or github issue on every Thursday to update one another on what progress each of us made. Of course, given the workload and the time given for this project, we expect to communicate outside of regular check-points.

LillianYKim commented 3 years ago

Lillian: Annual natural disaster by country https://public.emdat.be/data

LillianYKim commented 3 years ago

UPDATE 2: NOV 6TH 9PM

image

  1. Bella

    • The question I am mainly focusing on is "are the effects of climate change (specifically air pollution) evident in epidemiological prevalence data or mortality data?" Though the initial plan only mentions a shiny app, I have slightly altered the plan to make some leaflets without shiny interactivity first to tell a story about how air quality might be related to public welfare and then to make a shiny app as a supplementary resource. As of today (11/5/20), I was able to create a leaflet looking at global air quality levels (population exposed to PM 2.5) in 1990 successfully, and I plan to make a couple more with other specific years closer to the most recent year's data available (and will repeat this process using the public welfare data). Now that I have a functional leaflet, this should go by pretty fast. I haven't yet decided to whether use all the data I've wrangled before (there are data that specify the proportion exposed to certain level of PM 2.5, in addition to the one I'm using right now that looks at the population exposed to PM 2.5), but this is something I could either make a separate set of leaflets or incorporate into the shiny app and let the user select what they'd like to see. I think I am almost perfectly on schedule and don't have too much concerns!
  2. Lillian I was tasked with creating a data visualization for the 2nd question: how has the frequency of natural disaster changed over time and what are the consequences of natural disaster in each country? I am creating a shiny interactive app that has leaflet in it. Through shiny interactivity, the user will be able to choose the timeframe (year: 1950-2020) and the type of natural disaster (e.g. flood, storm, etc.), which will be reflected in the leaflet map. I finished creating a basic ui-server structure for shiny and wrote code for appropriate input objects (e.g. sliderInput, selectInput). I also successfully created a separate leaflet (without shiny interactivity) showing how severe the impacts of natural disasters are in each country in each year. However, where I am stuck right now is making leaflet work in shiny-interactive environment. I am currently trying to integrate leaflet into shiny by consulting several resources (e.g. http://rstudio.github.io/leaflet/shiny.html) Therefore, my next plan is to troubleshoot the leaflet map problem that I am having right now and to make my app more aesthetically pleasing. As long as I figure out how to solve this problem over the weekend, I don't think I will be late on schedule.

  3. Mythili

I'm exploring the 3rd question: How do climate change ripple effects show differently in developing (i.e. more agriculture-based) versus developed countries? Specifically, how are infant mortality and maternal mortality related to climate changes/disasters (e.g. extreme temperatures)? The topic came up from an article I read that said climate-change can impact certain agriculturally-based societies more than others (production of crops, soil health, etc. can be affected by temperature changes and more). Initially, my plan was to do k-means clustering to see the spread of countries and consider which ones have a higher infant/maternal mortality compared to others. Is there something that distinguishes these cluster (e.g. whether they're classified as developing vs. developed countries)? Currently, I have finished the clustering graphs for two different decades, the 1980s and the 2010s. Data was provided for each year from 1980-2018, however, it would be cumbersome and unnecessary to cluster all of those points (as it would be hard to distinguish multiple variables like year and country). So I averaged values within each decade, and looked at the first and last decade recorded to see the overall change. After viewing the visualizations, they didn't seem to really answer the question as I had envisioned it.

1) The clustering doesn't exactly connect how the infant/maternal mortality could be associated with climate change (since no climate change factor is a variable). This makes it seem kind of isolated from the rest of the tabs.

2) There are only around 40 countries included in the clustering (from the dataset we used), and none of these countries are from Africa, nor many parts of Asia, where agricultural societies are rampant. The article itself was a case study on countries in Africa and how climate-change induced agricultural damages affected countries and communities in poverty.

So I decided to tack on 2 other visualizations to supplement the clustering and connect my tab more to the general topic of our blog (info not included here because this update is getting very long - see next comment for more information on that!).

I achieved the work I expected, in that I started/finished the visualizations I first planned (k-means clustering graphs). I also made progress on refocusing my question and clarifying my visualizations. I still need to recheck the clustering and see if normalization of the values is necessary. I have also now added 2 visualizations (reactive line graphs) to my plate which I will work on this weekend. I'm still on track, as we wanted to finish the visualizations by Tuesday, but now there's slightly more time pressure to get the visualizations done (especially since they will be in Shiny). I would say the same checkpoint is still applicable to me although I'm more "behind" than I expected. I haven't worked as much on Data Science as I could have in the last few days, although I did continue to do data wrangling. If I had spent more time earlier in the week doing the visualizations I would have realized ahead of time what extra visualizations I would need to do and considered how my tab will answer the question posed. Regardless, I reached that point eventually and I will continue to work on my visualizations so that they're hopefully finished by Tuesday (as planned).

goni99 commented 3 years ago

I wrote this for the update but then realized the prompt doesn't ask about future plans so i'm just gonna save it here for my own records--feel free to do the same by editing this if any of you'd like :D

Bella

Mythili

-Next Steps:

Visualization A) Just looking at the first and last decade of infant/maternal mortality for each country takes out a lot of information. The infant/maternal mortality changes yearly, and sometimes by large amounts. Thus, on top of the clustering (which will show how countries compare in terms of change in mortality rate), I will do a line graph in Shiny where the user can choose the country and whether they want to view infant or mortality rate. The graph will then display how the rate has changed over the years between 1980-2018. To see why this graph enhances my exploration of the question, consider the 2nd additional visualization I have planned (described below).

Visualization B) I will use natural disaster data given from the same site as my infant/maternal mortality data. Specifically, I will look at natural disasters that could directly affect agriculture. Some choices are extreme temperature (which I'm leaning toward), floods, wildfires, etc. I will put this graph on the same tab of the Shiny app as my infant/maternal mortality line graph. That way, when the user chooses subset of countries, they can compare how, say, extreme temperature changed in certain countries between 1980 and 2018 alongside how infant or maternal mortality changed for those same countries over the same time span.

Why these visualizations? What do they add? First I will note the problems I noticed that necessitated more visualizations. What I noticed from the clustering is that all countries had a drop in infant/maternal mortality rates (expected, as newer technology and medicine and somewhat improved infrastructure would contribute to healthier societies). This made it hard to see if climate change would have any impact on the infant/maternal mortality rates. The comparison between the line graphs will allow us to see more clearly whether extreme temperatures (and their changing behaviour) in certain countries could be associated with the change in infant/maternal mortality of those countries. Did it prevent further improvement in the rates; was there not as much of an effect?

katcorr commented 3 years ago

Very thorough update! Glad to hear you're each on track more or less, and are making great progress!

Mythili -- to confirm, so you only have one observation per country contributing to the k-means clustering analysis, right? (e.g., you noted "So I averaged values within each decade, and looked at the first and last decade recorded to see the overall change.", meaning each country is one row and the variables included in the algorithm represent overall change in the various factors from the second from the first decade?)

Update 2: 5/5

LillianYKim commented 3 years ago

UPDATE 3: NOV 10TH 9PM EST

image

Group-wide initial plan by Tuesday: "ideally must have finished creating all data viz" Next group-wide check-point: Thursday (should have the major text portion of the blog and the data viz all completed"

Bella (Q1): I was able to create some more leaflets since the last update, and they are all done except for some minor aesthetic modifications I need to make. However, I am having a more difficult time trying to interpret what the data is showing in a broader context. My initial hypothesis was that the air quality would have gotten worse over the years, which would then lead to an increase in DALY scores that were directly caused by air quality. This is not what I observed after making the leaflets, as air quality improved based on the data over time, and the DALY scores in general decreased as a logical result. So in this sense, there are possibly some factors that have positively contributed to the air quality over time; still, it is a good news that the air quality data itself seems to be consistent with the DALY scores. To make some further analyses, what I plan to do is wrangle the datasets that I have right now so that a dataset contains the following information: air quality measure, DALY scores, year, and whether a country is relatively developed or underdeveloped (developing). Then, I plan to select a few random countries from each group (developed vs developing) and see how DALY scores and air quality measures change over time (i.e. the rate) through a set of faceted line graphs. This would possibly show whether for certain countries the effect of air pollution is more prevalent in the mortality data. I already have some wrangling done, so I think I should be able to finish by Thursday at the latest, which is still for the most part aligned with our initial plan. I have already noted a couple remarks about the result, so I think I should be able to have written up a rough draft of the blog post as well by then.

Lillian (Q2): Previously, I had been having difficulty translating the interactive leaflet map into shiny context. Thanks to professor Correia, I was able to solve the problem and successfully display leaflet on shiny app. I also made the app more informative, user-friendly and aesthetically pleasing by: a) changing selectInput() to checkboxInput() so that the user can select multiple types of disasters at once; b) moving ui area to the bottom to allow more space for leaflet map; c) adding a shiny theme. However, because of the technical problem I had earlier, I was not able to completely finish my shiny app. Right now the only thing left in my shiny app is to add some background information and instructions that may be helpful to the user (e.g. the data source, etc.) I do not think that this delay was substantial enough to adjust the checkpoint, so I would like to keep the schedule as it is. Instead, I will work on PUG project intensively until our next group-wide checkpoint, which is this Thursday. First, I will finish working on my shiny app by adding the last touch, as mentioned above. Then, specifically on Wednesday and Thursday, I will focus on writing the text portion of the blog. This may be challenging because there are more missing data on my choropleth than I expected. Therefore, I will spend sufficient time exploring different combinations of variables on my choropleth to see if there are any interesting relationships that stand out.

Mythili (Q3): I started creating the new visualizations I had planned in the last update. I aimed to finish the visualizations by today, and I believe I will be able to (I just need to do the last one). Specifically, for Visualization 1 (Clustering), I standardized the cluster variables so that the clustering is done more accurately and weights each variable equally. For Visualization 2, I completed the Shiny code (an interactive bar graph - note this is a change from the line graph I had initially planned). I coded user interactivity in choosing the countries and choosing the variable displayed (either net change in infant mortality or net change in maternal mortality). For Visualization 3, I planned out more specifically what I was going to do. Initially I thought I would be able to graph extreme temperatures over time, however, once I looked at the data set, I realized it just had frequency of various natural disasters rather then the actual temperatures. Thus, I decided to switch gears and graph the frequency of different natural disasters between 1980 and 2018 to compare in tandem with the net change in mortality. This way, my visualizations will more clearly answer the question of whether climate change-induced natural disasters are affecting the change in mortality in recent years.

I'm still on par with the schedule as long as I get Visualization 3 done today. However, there might be some spillover in terms of aesthetic edits I might want to make/making the visualizations clearer. The reason why I haven't done this yet, is because I wasn't able to work on the PUG Project as much this weekend (I was catching up on the reading, doing the lab, and doing homework for other classes). Thus, the questions and considerations of what would be the best visualization are coming up later than initially planned. This time around, I've found that I'm taking a more dynamic approach in doing the visualizations and being open to changing them radically/adding new ones to best answer the question.

In terms of adjusted checkpoints for myself, I propose the following:

Tuesday 11/10: -Finish Shiny code for Visualization 3 -Restrict years shown on bar graphs so visualizations are easier to digest -Consider whether to graph overall change from 1980-2018 in cluster graph -Consider whether to switch to gapminder dataset in R

Wednesday 11/11: -Transfer code for visualizations into Blog index.Rmd file and make sure there are no error -Review blog post requirements -Start planning out text for Q3 tab (framing the question, how we answered the question, calculations/evidence for claims?) -Meet with Lillian and Bella to split up work on cover page of Blog + review Blog post requirements

Thursday 11/12: -Finish bare bones of text with citations for papers referenced/data packages used -Figure out which code I want visible on tab -Do necessary work on Cover page (add hyperlinks, intro, images)

Friday 11/13 + Weekend: -Finish Blog text, finalize citations -Add aesthetics in -Troubleshoot -Practice presentation alone + with Lillian and Bella (Sunday/Monday)

mysubb commented 3 years ago

Very thorough update! Glad to hear you're each on track more or less, and are making great progress!

Mythili -- to confirm, so you only have one observation per country contributing to the k-means clustering analysis, right? (e.g., you noted "So I averaged values within each decade, and looked at the first and last decade recorded to see the overall change.", meaning each country is one row and the variables included in the algorithm represent overall change in the various factors from the second from the first decade?)

Update 2: 5/5

I made two different cluster graphs - One is for just the 1980s decade, in which each country has their average infant mortality (x-axis) and average maternal mortality (y-axis) graphed. The other graph is for the 2010s, in which, again, each country has their average infant mortality (x-axis) and average maternal mortality (y-axis) graphed. The average was calculated from the yearly values for that specific decade, so the average values in the 1980s graph are an average of the values from 1980, 1981, 1982, ..., 1989. So I used two separate tables, one per cluster graph. Now that I think about it, doing a cluster showing the overall change, with the table you mentioned, might be more useful and comprehensive, so I will consider whether to change what I have currently!

katcorr commented 3 years ago

Excellent progress, team! Thorough update.

@goni99 -- I hadn't understood yesterday when we were talking that it was actually that air quality has improved over time. Can you remind me what years you're looking at?

@mysubb -- thanks for the clarification about the clustering. What you did makes sense too, so feel free to leave it as is!

Update 3: 5/5

goni99 commented 3 years ago

@katcorr To answer your question itself, I was looking at 1990 to 2005 and 2017. To explain a bit of what I was bothering me, the major concern I had was essentially that the data was generally showing something opposite to what I had hypothesized, and I wasn't quite sure whether it'd be okay to simply explain what I saw and not conduct a further investigation on why that might be the case by, say, analyzing other data and potential factors. Specifically, I had expected that the air quality would have gotten worse over the years (increase in the magnitude of PM2.5) given that we are experiencing a drastic climate change, but it was actually the case where it had become better (decrease in PM2.5) in general. As far as I remember, I still do see a logical relationship between the air quality measure and other variables that I think I should be able to explain, so based on what you have advised us to do last class, I think I will stick to explaining what the data showed, try out a different additional visualization as explained, and perhaps do some researches and propose possible explanations for such results to include in the written part of the post. Thank you!

-Bella

Sorry if that was too long D:

katcorr commented 3 years ago

@goni99

Yeah, okay, that does seem opposite of what I would have expected as well given the time frame. And, you're right, if this were not a class project, you probably would want to investigate further with additional data (does the same data from different sources match up?) and factors (e.g. other measures of pollution). But, given the limited timeframe we have, that could be outside the scope of this class project. It would be appropriate to note in your conclusions all of these thoughts you have (e.g., your surprise at the results, future research directions to investigate further, etc.)