Closed MarcinKosinski closed 8 years ago
So what is the relation between this and pull request #15 ? Sorry for being dense, but we can't solve the same problem twice.
Via ...
#15 one can pass arguments described at the end of this answer http://stackoverflow.com/a/26119143/3857701 . So this means:
formalArgs(getMethod(dbConnect, "JDBCDriver"))
## [1] "drv" "url" "user" "password" "..."
But how can one pass parameters to dbConnect, when the original database URL looks like this:
jdbc:hive2://<host>:<port>/<db>;principal=hive/_HOST@<REALM>;ssl=true;sslTrustStore=<trust_store_path>;trustStorePassword=<trust_store_password>;transportMode=http;httpPath=<http_endpoint>
when through dplyr.spark.hive
(src_HS2
) one can only pass jdbc:hive2://<host>:<port>/
(I have ask about that on SO: http://stackoverflow.com/questions/34861844/passing-additional-parameters-to-dbconnect-function-for-jdbcdriver-in-r?noredirect=1#comment57470540_34861844 ). Since this might not be doable, maybe one should think about extending url
parameter in scr_HS2
.
My question should rather be entitled additional database connection configuration parameters
. I titled this user authentication
as I am only interested in auth=noSasl
parameter.
So far I am doing such workaround > my_db <- src_Hive(host = 'tools-1.hadoop.srv', port = "10000/loghost;auth=noSasl" )
> classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar",
+ "/usr/share/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar",
+ "/usr/share/hadoop/share/hadoop/common/hadoop-common-2.4.1.jar")
> Sys.setenv(HADOOP_JAR = paste0(classPath, collapse=.Platform$path.sep))
> library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
> library(dplyr.spark.hive)
Warning: changing locked binding for ‘over’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘partial_eval’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘default_op’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning messages:
1: package ‘dplyr.spark.hive’ was built under R version 3.2.2
2: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
3: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
>
> copy_to_from_local = dplyr.spark.hive:::copy_to_from_local
>
> my_db <- src_Hive(host = 'tools-1.hadoop.srv', port = "10000/loghost;auth=noSasl" )
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
>
>
> tbl(my_db, sql("select *
+ from TBL
+ where log_date = udf.current_day(-1)
+ limit 1000")) -> dane
>
> #dplyr.spark.hive:::tbl.src_HS2
So you are telling me #15 doesn't work? Did you test it at all?
It does, but not resolve All problems. I can not pass 'username' and 'password' to url so I am passing this though '...' in dbConnect. Now I'm trying to somehow Try to pass other arguments that are specified in url but this might probably be not doable :)
Marcin Kosinski
Dnia 19.01.2016 o godz. 18:53 Antonio Piccolboni notifications@github.com napisał(a):
So you are telling me #15 doesn't work? Did you test it at all?
— Reply to this email directly or view it on GitHub.
The dbconnect method just stuffs all the ... args in a java objects and passes it to a java connect call. What that accepts should be documented here https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
I may have missed it, but have you tried dbConnect(drv = mydrv, user = name, password = pwd, auth = token etc)
When you say "it might probably be not doable" do you mean you have tried and run into trouble?
I have tried various configurations of parameters' names in ...
in
dbConnect, and non of them enable me to connect with regular short url
jdbc:hive2://host:port/ . How do you think this token
should like like?
Just a character on a specific S3 class / S4 class in R? I've tried with a
character, but without a succes.
2016-01-20 8:18 GMT+01:00 Antonio Piccolboni notifications@github.com:
The dbconnect method just stuffs all the ... args in a java objects and passes it to a java connect call. What that accepts should be documented here https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients I may have missed it, but have you tried dbConnect(drv = mydrv, user = name, password = pwd, auth = token etc) When you say "it might probably be not doable" do you mean you have tried and run into trouble?
— Reply to this email directly or view it on GitHub https://github.com/piccolbo/dplyr.spark.hive/issues/18#issuecomment-173115878 .
Token should be a character and it gets converted to a character before being passed to the java side. So to recap you have a token that works when provided as part of the URL but does not when provided as an additional arg.
Yes. So probably I should have a look on JDBCDriver
method for
dbConnect
function in RJDBC
package or in java jars
2016-01-22 18:36 GMT+01:00 Antonio Piccolboni notifications@github.com:
Token should be a character and it gets converted to a character before being passed to the java side. So to recap you have a token that works when provided as part of the URL but does not when provided as an additional arg.
— Reply to this email directly or view it on GitHub https://github.com/piccolbo/dplyr.spark.hive/issues/18#issuecomment-173987697 .
moving this to rzilla/dplyr.spark.hive#1
After reading the source code of your package I have learnt how to connect, via
JDBC
drivers, toHiveServer2
fromR
statistical package. It is brilliant in its simplicity.I have realized that the following command will create a connection with
HiveServer
(despite I didn't specify log4j-properties file) and the next statement will list number of tables in a database.My connection will work only when I'll pass a user authentication parameters at the end of a
url
parameter, such as;auth=noSasl
. So far in it is impossible to pass such arguments tosrc_HS2
https://github.com/piccolbo/dplyr.spark.hive/blob/master/pkg/R/src-HS2.R#L39, sinceurl
only enables to passhost
andport
. Do you think it is possible to somehow pass it via...
parameter toJDBC
function, or one might create a PullRequest with an additional parameter specifying what should be afterport/
inul
parameter insrc_HS2
function?So far this is somehow the workaround/walkaround