kylemaxxwell / rpostgresql

Automatically exported from code.google.com/p/rpostgresql
0 stars 0 forks source link

dbWriteTable permission problem on bulk write #13

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
In RPostgreSQL 0.1-6, bulk copying using dbWriteTable fails if the backend
user (e.g. postgres) does not have read access to the (temporary) data file
created by the R user.

On my Mac OSX 10.5, dbWriteTable(con, name, value) creates a temporary file
in /tmp/Rtmp.../rsdbi... with the contents of 'value' saved as a
tab-delimited text file. The permissions on this file and the containing
directory are 0600, indicating that the file is unreadable except by the R
user. dbWriteTable then sends the COPY command to Postgres (COPY table_name
FROM 'filename'). If Postgres is running on a different user than R, then
the file is unreadable and a permission error will occur.

One solution is to explicitly change the permissions on the file before
calling copy. The below is a patch to hte postgresqlWriteTable function in
the R/PostgreSQLSupport.R file:

[MAC] > diff PostgreSQLSupport.R PostgreSQLSupport_edit.R
633,636c633
<     if(as.character(Sys.info()["sysname"])=="Linux")
<         fn <- tempfile("rsdbi","/tmp")
<     else
<         fn <- tempfile("rsdbi")

---
>     fn <- tempfile("rsdbi","/tmp")
638a636
>     Sys.chmod(fn, mode="0744")

--Robert

Original issue reported on code.google.com by rmcge...@gmail.com on 22 Feb 2010 at 6:52

GoogleCodeExporter commented 9 years ago
The proposed change will only solve the problem on single unified systems that 
have
both the database and R running on the same machine.  I'm running into this same
error when I try and load a dataframe from R running on one linux box and 
PostGreSQL
running on another linux box....  permissions are only part of the problem in 
this
case... the main one is availabililty of that local /tmp dir on the database 
machine.

Original comment by dschr...@gmail.com on 13 May 2010 at 3:33

GoogleCodeExporter commented 9 years ago
For the case of the multiple systems, I created a shared folder on the host 
computer
that was mounted to the same directory on the client computers via samba. I then
changed the tempfiles to save there. Very kludgy, but solved my limited needs 
with
limited work.

The best solution would probably use the same syntax as pg_dump and pg_dumpall, 
with
data coming from stdin all at once, e.g.:
INSERT INTO table FROM stdin VALUES (...)
That solution should be even faster than the copy method because the file 
doesn't
have to be saved to disk twice (first as a tempfile, second into Postgres). 
However,
the whole upload would need to occur in one transaction.

Original comment by rmcge...@gmail.com on 15 May 2010 at 1:14

GoogleCodeExporter commented 9 years ago
A mutually mounted shared folder seems like it could work.  I could just use a
temporary directory in the user's home dir like /home/fred/tmp/ ... might be 
nice to
have "/tmp" in the code above read from an environmental variable or a new 
parameter
(say something like 'tmp.dir') to dbWriteTable() instead.

Original comment by dschr...@gmail.com on 17 May 2010 at 8:11

GoogleCodeExporter commented 9 years ago
For example, the following edit works for me:

[someuser@linuxmachine]$ diff PostgreSQLSupport.R PostgreSQLSupport-edit.R 
633,636c633,640
<     if(as.character(Sys.info()["sysname"]) %in% c("Linux", "Darwin"))
<         fn <- tempfile("rsdbi","/tmp")
<     else
<         fn <- tempfile("rsdbi")
---
>     tmp.dir <- Sys.getenv('R_DB_TMP')
>     if(tmp.dir == '')
>       tmp.dir <- '/tmp'
>     if(!file.exists(tmp.dir))
>       dir.create(tmp.dir)
>     fn <- tempfile("rsdbi",tmp.dir)
>     Sys.chmod(fn, mode="0744")

as long as I'm sure to assign this environmental variable before I make a call 
to
WriteTable:

Sys.setenv(R_DB_TMP= '/home/someuser/tmp')

Original comment by dschr...@gmail.com on 17 May 2010 at 9:12

GoogleCodeExporter commented 9 years ago
As stated in the comment of the source and TODO file, the bulk copy operation 
should be done over the socket rather than passing the pathname.

How to achieve this? There is a hint in the psql \copy operation that does 
most similar to what we want to do.

The essence is to use "COPY tablename from STDIN"
and send the data using repeated PQputCopyData()
terminated with a single call to PQputCopyEnd()

A proof of concept code without sufficient error handling is attached.
(The patch is relative to RPostgreSQL_0.1-6.).

Also, I believe the issue 9 is the same problem.

Original comment by tomoa...@kenroku.kanazawa-u.ac.jp on 11 Sep 2010 at 7:19

Attachments:

GoogleCodeExporter commented 9 years ago
This should have been fixed as of r144

Original comment by tomoa...@kenroku.kanazawa-u.ac.jp on 13 Oct 2010 at 2:45