jsoma / data-studio-projects

12 stars 18 forks source link

Natural disasters in USA #122

Open ElinaMak opened 6 years ago

ElinaMak commented 6 years ago

Please complete all of the following sections, or the ghost of Joseph Pulitzer will spookily dance around your issue! A completed version of this template can be found at https://github.com/jsoma/data-studio-projects/issues/1

Pitch

What is my question: -what kind of natural disasters exist in USA (fires, floods, tornados, mudslides, earthquakes, hurricanes etc)? Which ones are more frequent? -Are those areas populated? -If there is a type of a catastrophe that is a repeated phenomenon in a certain area / state? In that case, should there be state indemnity to compensate the losses of the local citizens? Should taxpayes pay for repeated catastrophic risks to people who insist living in an environment that they know for certain it will be destroyed?

Summary

Then I contacted the Federal Emergency Management Agency where they gave me access to the database: "Disaster declaration". Μy first thought was to scrape the pages but soon I realized that some terms were unknown to me. For instance, fires are categorized, some have a cetain type and name, hurricanes have names and so on, so not clear which values go where to the pandas dataframe. Due to time considerations, I thought to limit my research only to the natural disasters DECLARED in 2018 and fall 2017. I ended up with 102 entries. While building the database, I had to search / google for the event in order to understand the type of disaster as in several cases the event is written in the examined database in the following manner: ex. Arizona 89 East Fire. Moreover, in some cases there were some numbers that I could not identify if it was a code for the state or the type of the disaster (ex. Texas 335 Fire (FM-5234) One should be carefull because some events happened in 2015 but "declared" (term used by the Agency) much later -months, or even years as it is the case for Alaska-.

Details

Possible headline(s):

Data set(s):

I created my own dataset. Although I checked for official statistics, at Disasters | Data.gov, the available datasets were not clear and when I tried to open the pdf's, I got always the message: "Access denied". I emailed the Agency but so far, I have no answer.

So I decided to be more creative in my way of finding data FAST:

2018-07-09 3 24 44

Code repository:

https://github.com/ElinaMak/data_studio/blob/master/01-dogs/Natural%20Disasters%20in%20USA-Copy1.ipynb

Possible problems/fears/questions:

Work so far:

2018-07-11 12 27 08

Checklist

This checklist must be completed before you submit your draft.

ElinaMak commented 6 years ago

2018-07-11 12 27 08 2018-07-11 12 35 31

ElinaMak commented 6 years ago

2018-07-11 12 35 55

christina10211 commented 6 years ago

I really like the angle you picked and all the efforts putting together the data. I think for the three analysis you've done, you can probably think about graphics other that bar charts to represent your data. (for instance, maybe pie chart for most common disasters in the US?)

angelareplica commented 6 years ago

Interesting idea! Would be interesting if there were a way to measure how destructive these disasters were. (e.g. What states have fires that destroyed the most amount of property/land etc

ElinaMak commented 6 years ago

Update

For the 2nd draft of my project, actually, I tried to scrape the website that lists the natural disasters. So, not a lot of new stuff regarding visualizations but rather back to the basics of programming (on which more practice will help): scraping, for loops, regex.

I managed to scrape 1.000 events this time (instead of the 100 events which was my initial dataset). Hmm, an interesting process with several pitfalls on the way..

Challenges Firstly, it was not a good idea to scrape differently Region + type of the incident and then the date of the incident although they are in two different lines on the website. I should have scraped the one div class that contains both. But will keep all those steps on the notebook.

There was also a challenge to scrape the 50 pages with Beautiful soup as the first page had no No.1 so i scraped it differently and then use the url and incremented.

kellykiki commented 6 years ago

You did a lot of work regarding scraping and transforming data! Probably I would choose to include only one year in my dataset (for example, just 2017) and not mix historical years (both some months of 2017 and some months of 2018). Looking forward for the final visualizations!

nickospi commented 6 years ago

Very interesting idea involving a lot (A LOT) of scraping and cleaning work. Would elaborate more on the graphics part just to finish up for now and go back in the future. There can be some very interesting data visualisations out of this data set.

playfairbot commented 6 years ago

Hi! I'm a little robot, here for a surprise inspection.

You need some feedback, let me summon @mattrehbein, @SimoneLuc, @hakantan for you

It looks like we need to fix up your your update a little bit! Edit it by clicking the pencil in the top right-hand corner. It requires:

Maybe you just didn't use the template? If not, edit your comment, cut and paste the template in, and then fill it out.

mattrehbein commented 6 years ago

It's a really interesting and also a really big topic! For the purposes of the project, it would probably be easier to narrow the focus. Your last graph seems to be getting at a more specific interesting angle, but without an exact title I'm not totally sure what it's showing. But I like the combination of 'these states have the most natural disasters' and 'here's the most common type of disaster in those places.'

And I agree with above comments that a measure of how deadly some of the disasters are would really help give readers an understanding of the impact.

Nice job chasing a lot of data!

ElinaMak commented 6 years ago

Here is the last update regarding the code: Too much scraping!

https://github.com/ElinaMak/data_studio/blob/master/Natural%20Disasters%20in%20USA%20with%20Scraping.ipynb

And I am still collecting data regarding the casualties..

states

most common type

hakantan commented 6 years ago

Just by reading it, this seems to be a huge project with regards to getting the data in the first place, so kudos for that.

sarahslo commented 6 years ago

you've done a lot of work with the data. good journey to have to sort through all that. it's never as simple as it seems. so keep in mind that california and texas are big states. big states have more of...everything. so you want to perhaps normalize some of the data by population otherwise all we get are...big states, lots of disasters.

once you've done that, (divided by population and make it a per 1,000 people or per 10,000 people maybe you can tell us which states have the most...fires, floods as a percentage of total disasters. that way we can see which state is most likely to have a tornado for instance. if you mapped this, you'd see there is a part of the us that is tornado prone. it's actually called tornado alley.

i like that you are curious about this, but i think you may want to narrow in on something here. a type of disaster, for instance, or an analysis of certain parts of the country. it's very broad. what interests you most here? give us more of that.

good work, keep at it.

ElinaMak commented 6 years ago

Final comment

What did you find to be the most difficult part of this project?

The scraping!

Are you satisfied with what you produced? Is there anything you would like to change or improve?

I concentrated on the scraping and not on the graphic implementation of the data. And with more time for the research, I would investigate more the destroyed hectares and the casualties.