mlverse / pysparklyr

Extension to {sparklyr} that allows you to interact with Spark & Databricks Connect
https://spark.posit.co/deployment/databricks-connect.html
Other
14 stars 3 forks source link

rpy2 issues on Windows preventing installation #125

Closed seugurlu closed 2 months ago

seugurlu commented 2 months ago

Hi.

I understand from version 0.1.4, install_pyspark() and install_databricks() functions are installing rpy2 as a part of the Python environment being set up. On Windows, this seems to cause an issue as there are problems with installing rpy2 on Windows. rpy2 documentation also states that rpy2 on Windows is currently not supported (e.g. see rpy2 documentation).

Despite some online search coming up with suggestions on how rpy2 can be installed on Windows machines, I failed to achieve this on my end using a corporate laptop with limited rights. Considering users such as me, I was wondering if rpy2 installation can be made optional? I appreciate this will make some features of pysparklyr not available to users but the only solution we have been able to find so far is to use version 0.1.3, which is less ideal.

Thanks for your consideration.

edgararuiz commented 2 months ago

Thank you for the call out, I think we should be able add an enhancement that skips installing rpy2 if on a Windows machine, and to also add a check when running spark_apply() and automatically attempt to install it. We do something similar for the huge TF libraries needed to run ml_ functions, we don't install them until you try to run them for the first time

seugurlu commented 2 months ago

Thank you. Looking forward to it.

edgararuiz commented 2 months ago

@seugurlu - Can you try the dev version to make sure it solves your issue?

pak::pak("mlverse/pysparklyr@updates")
seugurlu commented 2 months ago

Apologies for the belated response. Yes, this now works. Thank you for the prompt solution.