Closed MarcinKosinski closed 8 years ago
I can follow your reasoning but this is breaking with a RHadoop project (the Revolution-sponsored project within which this package was started) convention, that is that env variables are the way packages are made aware of external dependencies. Some arguments against providing an alternate mechanism
That said, I am not totally sold on my own arguments and need to think about it. Feel free to push back. Maybe there is an argument for R-only-shell-may-not-exist assumption. There is may be a point of view for instance of a windows user that's not as familiar with env variables, and that clearly doesn't come natural to me. The only thing I would like to see is a general answer to "how do we connect R with external dependencies" rather a case by case decision that may be determined by an accident like incomplete or unclear installation instructions.
You have convinced me it is not necessary. In my opinion it'd be better to write a manual/vignette for dummies (me) rather than adding new parameter. Since philosophy of parameter's simplicity is consistent with other RHadoop packages, this topic can be closed.
Thanks, feel free to reopen were there new elements to this discussion.
Please forgive me for my technical/development ignorance. I am just a regular R user with statisical knowledge. I have a proposition of a small improvement (at least, I think so :) ) which might help the possible future users.
When I tried to use
dplyr.spark.hive
for the first time withsrc_Hive()
function I didn't know I had to somehow specify any global environment/variable such asHADOOP_JAR
. When the function didn't work, the first step I took, to figure out what I might have missed, was a quick look at the documentation page. I thought I might missed special parameter but there wasn't such possibility. That's why I have posted questions on stackoverflow and created issues here.Maybe to avoid such problems with the user's lack of knowledge someone could add additional new parameter to src_Hive so that it would be easier to understand that user has to specify a
.jar
to create aJDBC
driver. The nameclassPath
would correspond to the name of the parameter inJDBC
.I think this might look like:
Also it would require some updates in the documentation page. I think such update will make the use of this package more clear and easier. If that proposition is OK I can proceed with a PR :)