luminousmen / luminousmen.com

2 stars 0 forks source link

https://luminousmen.com/post/spark-tips-partition-tuning #13

Closed utterances-bot closed 3 weeks ago

utterances-bot commented 4 years ago

Spark Tips. Partition Tuning - Blog | luminousmen

Data partitioning is critical to data processing performance especially for large volumes of data processing in Spark. Here are some partitioning tips

https://luminousmen.com/post/spark-tips-partition-tuning

gogi2811 commented 4 years ago

Brother vey deep and informative article, I need on help.I also want to go deep in Spark n Pyspark, but only way to do i see is if i get a project in my job to work upon it? would you be able to recommend me some other way to get expertise in Spark ?.not just going through some courses. Any help would be highly appreciated

luminousmen commented 4 years ago

Hi @gogi2811, thanks for the feedback! As for your question, I am thinking of several options:

  1. Do some side project, your limit here is your imagination. For example, create a system to collect Twitter posts with pyspark, maybe do some analysis.
  2. For me, a more interesting idea would be to take part in some kaggle.com competition . Mostly it's DS/ML, but you can do some EDA, calculate statistics with pyspark, aggregate data and build charts.
  3. If you are an open source enthusiast, you can go to github and find and contribute to projects that use pyspark.
  4. If you want to go deeper to spark itself you can contribute to the spark project itself. Of course, spark code mostly in Scala but there are python parts.
gogi2811 commented 4 years ago

Hi @luminousmen thank you very much for your help and guidance. I'll start with Twitter analysis project and take it deep. I'll keep coming back to you for guidance. Thanks a lot

zizhaof commented 4 years ago

Well summarized!