minrk / findspark

BSD 3-Clause "New" or "Revised" License
511 stars 72 forks source link

what version of spark does this work with? #36

Closed aedavids closed 3 years ago

aedavids commented 3 years ago

I have not used spark in several years. I have jupyter installed on my mac. In version spark-2.3.0-bin-hadoop2. I could start it as follows

export SPARK_HOME = sparkpath
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

pyspark  $extraPkgs $*

PYSPARK_DRIVER_PYTHON is still supported https://spark.apache.org/docs/latest/configuration.html#environment-variables

PYSPARK_DRIVER_PYTHON_OPTS is undocumented. Not sure if it is needed or not

this approach continues to work. (I tested using a trivial example from https://spark.apache.org/docs/latest/quick-start.html on spark-3.1.2-bin-hadoop3.2

I am not sure what your solution does?

I think your solution might be better. I will be able to start the Jupyter server and only use spark in the notebooks that are needed

Kind regards

Andy

minrk commented 3 years ago

I am not sure what your solution does?

It does very little! All it does is try to find where spark is installed and add it to sys.path so it's importable. That's really it. This is most of what the pyspark entrypoint does, but from the other direction - it knows where pyspark is, but needs to find Python, etc.

Doing it the other way around makes it easier to use a spark installation with any Python environment. Pyspark isn't really a special Python package - it's a very normal one. The only thing weird about it is how it's typically installed, which means some work needs to be done to make import pyspark work most of the time.

aedavids commented 3 years ago

Very cool!

This should make it easier for me to integrate with my IDE and unit test frameworks

thanks

From: Min RK @.> Reply-To: minrk/findspark @.> Date: Thursday, September 30, 2021 at 11:45 PM To: minrk/findspark @.> Cc: "andrew e. davidson" @.>, Author @.***> Subject: Re: [minrk/findspark] what version of spark does this work with? (#36)

I am not sure what your solution does?

It does very little! All it does is try to find where spark is installed and add it to sys.path so it's importable. That's really it. This is most of what the pyspark entrypoint does, but from the other direction - it knows where pyspark is, but needs to find Python, etc.

Doing it the other way around makes it easier to use a spark installation with any Python environment. Pyspark isn't really a special Python package - it's a very normal one. The only thing weird about it is how it's typically installed, which means some work needs to be done to make import pyspark work most of the time.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/minrk/findspark/issues/36#issuecomment-931954434, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AN3VWAXBAUOFZRZUUDA5YM3UEVKJTANCNFSM5FDQB4XQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.