[PROJECT FOUR] The Amazing How-Would-You-Die Quiz

honjy commented 8 years ago

This project is based on Pitch #169

I've found and combined hundreds of CSV files that list the cause of deaths by rate of death per 100,000 people, by country, by age and by gender. I will now create a quiz.

The problem is my dataset is wayyyy too large. Any ideas how I can work around this?

[x] My pitch has been approved (see PITCHING.md)
[x] My story issue links to my pitch issue
[ ] I link to my finalized (ish) data source(s)
[x] I've included a brief summary of my story
[x] I've included some possible headlines or findings
[x] I've included some links or images as inspiration (if you have any)
[ ] I have received two comments of peer feedback
[x] I've included an update of my visualization/story in a comment
[ ] I have received two comments of peer feedback after posting an update
[ ] I have received editorial feedback

playfairbot commented 8 years ago

Hi there, I'm the Playfair Bot!

Thanks for posting your story issue, but would you mind adding editing the original issue to add the first draft of your image? You have my sincere apologies, but it's easier for dumb robots like me when the comments are only used for updates.

Thanks! :pray:

ghost commented 8 years ago

Super cool idea!

re: your question about dataset size, I ran into this with my trees project. Why not just limit the nrows to something manageable while you work on your code, and once you get it working to your satisfaction, scale up bit by bit (nrows = 500, then 1000, then 5000, then 10,000), addressing any problems with cleaning as you go?

gcgruen commented 8 years ago

I like the quiz idea -- maybe because I like dark humor. To scale it down, I would maybe not just arbitrarily select x rows, but instead maybe select a continent or region to focus on first. If you don't want to use all indicators (and thereby reduce the amount of data being imported with pandas), I found the usecols=['columnname1', 'columnname2', ...] helpful. You can just pass it a list of the column titles and it will only import those specified when you do the pd.read_csv

jsoma / playfair-projects

[PROJECT FOUR] The Amazing How-Would-You-Die Quiz #205