aside from SQL, we sometimes have more involved analyses, which we'd typically run in R.
For example, we might have some complicated regex for license info or some such thing already coded in R.
(This is not a great example, because it could perhaps be done in just SQL and custom functions in BigQuery, but still).
For these expensive, non-SQL analyses we need an MPP solution, ideally tightly integrated with our data warehouse.
We might have several MPP needs:
"native" spark (without any additional R pkgs)
distributed R with spark_apply() (though this does not use containers and may make dependency management iffy again)
aside from SQL, we sometimes have more involved analyses, which we'd typically run in R. For example, we might have some complicated regex for license info or some such thing already coded in R. (This is not a great example, because it could perhaps be done in just SQL and custom functions in BigQuery, but still).
For these expensive, non-SQL analyses we need an MPP solution, ideally tightly integrated with our data warehouse.
We might have several MPP needs:
spark_apply()
(though this does not use containers and may make dependency management iffy again)