pghpy / pghpy-talks

9 stars 1 forks source link

Pandas and Big Data #7

Open synapticarbors opened 7 years ago

synapticarbors commented 7 years ago

At the most recent meetup there was some interest about learning how to do "Big Data" with pandas. For the purpose of starting the discussion, I'll frame that as analysis/manipulation of data that is larger than can easily fit in-memory on your laptop/workstation. There are a number of tools out there to do this. The one I'm most familiar with is Dask (http://dask.pydata.org/).

Anyone interested in planning for a talk/tutorial in 2017?

cc/ @AlbertDeFusco @annafil

AlbertDeFusco commented 7 years ago

Sure, I'll get involved. Dask is a really great tool for getting started with larger-than-memory data.

Other things than can be discussed in the spectrum of tall data to big data:

robert-lucente commented 7 years ago

It is interesting how Dask has a "task graph". This concept of a task graph shows up over and over. Terraform is another example that I recently ran into (github.com/hashicorp/te­rraform)