petewarden / dstk

A collection of the best open data sets and open-source tools for data science
http://www.datasciencetoolkit.org/
1.12k stars 186 forks source link

Size issue #21

Open ghost opened 11 years ago

ghost commented 11 years ago

I'm following your EC2 setup instructions as guide to install on a non-EC2 server. I noticed this is taking up quite a bit of space. Is there any way to specify the geographical data that's used to save space and bandwidth?

petewarden commented 11 years ago

I've updated the ec2setup.txt documentation to indicate the section that's associated with the geo-statistics data, since that's the largest disk-space guzzler. It does mean that coordinates2statistics won't work.

Let me know if that helps!

ghost commented 11 years ago

Very helpful, thanks! There were a few times I had to unzip files one at a time and then delete them from the archive so I wouldn't run out of space.