xxleyi / learning_list

聚集自己的学习笔记
11 stars 3 forks source link

Pyspark Pi #152

Open xxleyi opened 5 years ago

xxleyi commented 5 years ago
import os
import subprocess
os.environ["JAVA_HOME"] = subprocess.check_output(["/usr/libexec/java_home", "-v", "1.8"]).decode().strip("\n")

import random
import pyspark
import findspark

findspark.init()

sc = pyspark.SparkContext(appName="Pi-1")
num_samples = 100000000

def inside(p):
    x, y = random.random(), random.random()
    return x*x + y*y < 1

count = sc.parallelize(range(0, num_samples)).filter(inside).count()

pi = 4 * count / num_samples
print(pi)

sc.stop()
xxleyi commented 5 years ago

连接 Cassandra

spark-submit --py-files /path/pyspark-cassandra-2.4.0.zip \
             --packages anguenot/pyspark-cassandra:2.4.0 \
             --conf spark.cassandra.connection.host=CASSANDRA_DB_IP

pyspark --py-files /path/pyspark-cassandra-2.4.0.zip \
        --packages anguenot/pyspark-cassandra:2.4.0 \
        --conf spark.cassandra.connection.host=CASSANDRA_DB_IP