jsoma / playfair-projects

Common repository of projects for Playfair
6 stars 32 forks source link

[PROJECT FOUR] The Amazing How-Would-You-Die Quiz #205

Closed honjy closed 7 years ago

honjy commented 8 years ago

This project is based on Pitch #169

I've found and combined hundreds of CSV files that list the cause of deaths by rate of death per 100,000 people, by country, by age and by gender. I will now create a quiz.

The problem is my dataset is wayyyy too large. Any ideas how I can work around this?

playfairbot commented 8 years ago

Hi there, I'm the Playfair Bot!

Thanks for posting your story issue, but would you mind adding editing the original issue to add the first draft of your image? You have my sincere apologies, but it's easier for dumb robots like me when the comments are only used for updates.

Thanks! :pray:

ghost commented 8 years ago

Super cool idea!

re: your question about dataset size, I ran into this with my trees project. Why not just limit the nrows to something manageable while you work on your code, and once you get it working to your satisfaction, scale up bit by bit (nrows = 500, then 1000, then 5000, then 10,000), addressing any problems with cleaning as you go?

gcgruen commented 8 years ago

I like the quiz idea -- maybe because I like dark humor. To scale it down, I would maybe not just arbitrarily select x rows, but instead maybe select a continent or region to focus on first. If you don't want to use all indicators (and thereby reduce the amount of data being imported with pandas), I found the usecols=['columnname1', 'columnname2', ...] helpful. You can just pass it a list of the column titles and it will only import those specified when you do the pd.read_csv