greenplum-db / PivotalR-archive

An convenient R tool for manipulating tables in PostgreSQL type databases and a wrapper of Apache MADlib.
https://pivotalsoftware.github.io/gp-r/
126 stars 53 forks source link

.unique.string() : problem with LC_TIME locale #12

Closed roolio closed 10 years ago

roolio commented 10 years ago

hello, I'm using freshly compiled version of PivotalR, on mac os, french locale when running summary on a table object, I get

"Error in db.data.frame(tbl.output, conn.id = conn.id, verbose = FALSE) : No such object in the connection 1"

I think the problem comes from the .unique.string() function used in temp table creation. Specifically in the use of strptime(date(),"%c")

The point is that this function is dependent of the computer locale LC_TIME, so mine was fr_FR.UTF-8, and strptime(date(),"%c") returned "NA"

After this, the name found in Greenplum seems to be lowercased in "na", so the temp table name is not recognized when retrieving it from R

If I change with Sys.setlocale("LC_TIME", "C") it works :)

Solution might be to be locale-independent in the construction of temp table names

Hope it helps

(btw, kudos for your package. discovered it today & it really rocks)

Cheers

Julien

vatsan commented 10 years ago

Thanks for bringing this to our attention, Julien. Are you using PivotalR on Greenplum or PostGres as well? Do you have a Greenplum installation?

@walkingsparrow How about we use something like UUID's for this (http://cran.r-project.org/web/packages/uuid/uuid.pdf) in-place of creating one ourself? Python has had one as part of the core lang for a while now (http://docs.python.org/2/library/uuid.html).

vatsan commented 10 years ago

@walkingsparrow Submitted pull-request, you may want to check this: https://github.com/gopivotal/PivotalR/pull/13

roolio commented 10 years ago

thanks @vatsan for your reply.

At MFG Labs (my company), we're happy users of Greenplum CE & Postgres as well : Madlib, PL/R, PL/Python, and PivotalR appears as an important step forward, benefiting both the R environment and database-side computation. Really cool indeed! Look forward seeing other parts of Madlib integrated in PivotalR. I'm more a R user than a R developer, but maybe that once the db.table and al. , the biggest part is done.

Also I was wondering if Greenplum cool aggregate expressions (like PERCENTILE_CONT) had a special treatment in PivotalR, for instance using the "by" function?

btw, UUID seems a safer choice indeed

Cheers

Julien

walkingsparrow commented 10 years ago

thanks @vatsan, I will review the pull request and merge it.

And thanks @roolio Glad that people are starting to notice PivotalR.

Many aggregates are supported. PERCENTILE_CONT is not. Maybe we should add it in the next version.

vatsan commented 10 years ago

@roolio Thank you for sharing your feedback, we love to hear from our users & customers and that motivates us a lot in contributing more to open source.

If you have folks at your company who would describe themselves as Python fans, you might also be interested in our other open source project "PyMADlib" (https://github.com/gopivotal/pymadlib) - which is a Python wrapper for MADlib.

Do let us know if you use it, would be happy to add more functionality if i see more adoption.

roolio commented 10 years ago

@vatsan I haven't checked PyMADlib yet, but definitely will (we have indeed some Pythonistas at work!) @walkingsparrow : thanks for feedback. I'll check the UDAGs the interface allow, but yes, percentile would be great.

Cheers

walkingsparrow commented 10 years ago

The problem has been fixed in master.