PIvotalR - Error executing madlib summary (and other) commands

greenplum-db / PivotalR-archive

An convenient R tool for manipulating tables in PostgreSQL type databases and a wrapper of Apache MADlib.

https://pivotalsoftware.github.io/gp-r/

125 stars 53 forks source link

PIvotalR - Error executing madlib summary (and other) commands #20

Closed abhishek-agrawal closed 10 years ago

abhishek-agrawal commented 10 years ago

Hi Guys,

I am seeing some issues when I try to run madlib related commands via PivotalR. For example when I hit madlib.summary

Command madlib.summary(trainingData)

Error Error in postgresqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not run statement: ) Error in postgresqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not run statement: ) Error in db.data.frame(out.tbl, conn.id = conn.id(x), verbose = FALSE) : No such object in the connection 6

At the same time I am able to run madlib.version. Also I am able to execute dim(trainingData). What am i doing wrong? Any help would be much appreciated

walkingsparrow commented 10 years ago

Can you give me more details? And please try the latest master (v0.1.14.45), which has a better error handling for madlib.summary, and let me know the error message that you see.

If you could send a portion of your data to me, that would be nice. I could try it on my computer.

walkingsparrow commented 10 years ago

MADlib's summary function has a known bug. It cannot handle the case where the table name or column names contain capital letters. If your data table has upper case characters in the table name or column names, madlib.summary will fail due to MADlib's current limitation on summary.

abhishek-agrawal commented 10 years ago

I am currently using 0.1.11.3. Where can I download the latest version?

Also madlib.lm is also giving me an error.

Here is my code. next_year_sales and tot_sales are columns in my training table trainingData <- db.data.frame("schema_1.table1") madlib.lm(next_year_sales~tot_sales,data=trainingData)

and this gives me the error Error in postgresqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not Retrieve the result : ERROR: plpy.SPIError: no schema has been selected to create in (plpython.c:4648) CONTEXT: Traceback (most recent call last): PL/Python function "linregr_train", line 25, in return linear.linregr_train(**globals()) PL/Python function "linregr_train", line 116, in linregr_train PL/Python function "linregr_train" ) In addition: Warning message: package ‘RPostgreSQL’ was built under R version 3.0.0 Error in postgresqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not run statement: ) Error in db.data.frame(tbl.output, conn.id = conn.id, verbose = FALSE) : No such object in the connection 1

walkingsparrow commented 10 years ago

You can install the latest package from this github repository. See the Readme for how to install: https://github.com/gopivotal/PivotalR

walkingsparrow commented 10 years ago

Multiple bugs have been fixed since 0.1.11. Perhaps they have been fixed already.

And is it possible to share with me at least part of your data so that I could look into this issue?

abhishek-agrawal commented 10 years ago

Is there a way to download the binary?

walkingsparrow commented 10 years ago

No. The easiest way is to use the following 3 commands in R console:

install.packages("devtools") library(devtools) devtools::install_github("PivotalR", "gopivotal")

abhishek-agrawal commented 10 years ago

I am behind a corporate firewall. I cannot install packages. Also when I try to install from a source on windows I get an error

Error in untar2(tarfile, files, list, exdir) : unsupported entry type 'g'

walkingsparrow commented 10 years ago

Can you send me 100 rows of your data?

walkingsparrow commented 10 years ago

Just sent you a 32-bit windows binary PivotalR package.

abhishek-agrawal commented 10 years ago

Thanks much appreciated. Lay me check this out & revert

Sent with Good (www.good.com)

-----Original Message----- From: Hai Qian [notifications@github.commailto:notifications@github.com] Sent: Thursday, March 06, 2014 02:35 AM Eastern Standard Time To: gopivotal/PivotalR Cc: Agrawal, Abhishek Subject: Re: [PivotalR] PIvotalR - Error executing madlib summary (and other) commands (#20)

Just send you a 32-bit windows binary PivotalR package.

— Reply to this email directly or view it on GitHubhttps://github.com/gopivotal/PivotalR/issues/20#issuecomment-36831537.

This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.

abhishek-agrawal commented 10 years ago

@walkingsparrow

I am sorry did you email me the framework? I dont have it?

walkingsparrow commented 10 years ago

I sent the binary package to your personal email.

abhishek-agrawal notifications@github.com于2014年3月6日星期四写道：

@walkingsparrow https://github.com/walkingsparrow

I am sorry did you email me the framework? I dont have it?

— Reply to this email directly or view it on GitHubhttps://github.com/gopivotal/PivotalR/issues/20#issuecomment-36964935 .

Pivotal http://www.gopivotal.com/ A new platform for a new era

walkingsparrow commented 10 years ago

Given the detailed information, I think I know what is wrong:

MADlib functions (and their PivotalR wrappers) create intermediate tables during the computation. PivotalR creates these tables in the default schema in the "search_path". So the search_path cannot be empty, otherwise PivotalR does not know which schema it is allowed to write.

So you will need to specify the search_path for this session. For example,

> connectionId = db.connect(host = hostName, user = userName,password = pass, 
   dbname = databaseName, madlib="madlib", default.schemas = "test")

Or you can view and set the search_path after the connection has been created

> db.default.schemas(connectionId) # list current session's search_path
> db.default.schemas(connectionId, set = 'test, "$user"')

I believe this will fix this problem.

On my side, I will update the package to let it raise a warning during db.connect if search_path is empty or an invalid schema name. The warning will make things easier to understand for the users.

walkingsparrow commented 10 years ago

This has been resolved.