Get Data Assignment - Githubissues

huq23 commented 4 months ago

@AaronGullickson Professor, I am facing a problem while reading the data in R in the organize_data.qmd file. I do not know why I am unable to read the data using read_xlsx function. Please help me to in this regard.

AaronGullickson commented 4 months ago

Hi @huq23. I can read them in alright, but they are not properly formated to be read by a computer, but rather are made to be read by a human as a final step. If you are downloading these from Social Explorer, you need to be sure to pick the correct option to get them downloaded as a flat CSV file. I can't access social exporer right now to walk you through it, but its the same process as we used last term.

AaronGullickson commented 4 months ago

Hello @huq23. I see that you are trying to use the data that you have but this is leading to some really difficult and awkward coding because these files are not meant for machine reading. Please remove these files and get data that is machine readable.

To get ACS data, you should use the Social Explorer site just like we did last term.
To get crime data by state is a little more difficult because the UCR data site is terrible. However, they do provide the ability to scrape it directly via an API . You will need to get your own API access key from here. The code below should get you what you want:

#replace this with your API key
api_key <- "PUT API KEY HERE"

# choose starting and ending year
start_year <- 2017
end_year <- 2019

url_base <- "https://api.usa.gov/crime/fbi/cde/estimate/state/{state_abbr}/{type}?from={start_year}&to={end_year}&API_KEY={api_key}"

crime_data <- as_tibble(expand.grid(state_abbr=c(state.abb, "DC"),
                                    type=c("violent-crime","property-crime")))

urls <- crime_data |> glue_data(url_base)

rates <- map_dfr(urls, function(url) {
  results <- (curl::curl(url) %>% read_html() %>% html_nodes("p") %>%
                html_text %>% fromJSON)$results[1] %>%
    bind_rows()
  return(results)
})

crime_data <- bind_cols(crime_data, rates)

# now reshape
crime_data <- crime_data |>
  pivot_longer(cols = c(`2017`,`2018`,`2019`),
               names_to = "year", values_to = "rate")

# now reshape again to get violent and property on the same line
crime_data <- crime_data |>
  mutate(type = str_remove(type, "-crime")) |>
  pivot_wider(id_cols = c(state_abbr, year), names_from = type, values_from = rate)

You will have to install and load the curl, glue, and stringr packages in addition to the usual tidyverse.

huq23 commented 3 months ago

@AaronGullickson : Dear Professor, thank you for the codes to help modify the crime data. For the social explorer data, since I am focusing the years from 2017 to 2019, I have individually downloaded data for 2017, 2018, and 2019. I have also modified my earlier codes and have tried to make them simpler and joined them accordingly. Could you please check my codes and let me know if I am on the right track?

AaronGullickson commented 3 months ago

It looks much better. It looks like you are still using the xlsx results for the ACS rather than the more machine readable CSV results. At this point, I wouldn't worry about it, but you have done more work than you should need to do to make that data format work for you.

huq23 / SOC513_Project

Get Data Assignment #2