hwayne / awesome-cold-showers

For when people get too hyped up about things
Other
7.26k stars 82 forks source link

Add caveats to Scalability section (memory, storage) #26

Open nfd9001 opened 2 years ago

nfd9001 commented 2 years ago

I read the Scalability entry, and it's a good post. I'd add a couple more caveats (discussed briefly in the article). Not all "big data" scalability problems are built around scaling out the number of CPU cores; I've worked in "big data" scaling on Spark before and often built out clusters for 10,000-100,000 times the dataset size of the one on McSherry's laptop. The calculus for these sorts of systems starts to tip back towards "the cluster's better" fairly quickly when you're also dealing with bus and memory bounds (do you have enough memory to hold the data you need in-memory, plus room to receive shuffles? Do you have a local network/NICs that are adequate to run those shuffles in reasonable time? Do you have enough striped fast storage?)

I'd add the 1G (still fairly large, sure) dataset size to the Shower part and explain that this is heavily a warning against overengineering and premature optimization.

hwayne commented 1 year ago

Thesea re good ideas