harrelfe / Hmisc

Harrell Miscellaneous
Other
208 stars 81 forks source link

error in csv.get under R 3.1.0 #14

Open ringprince opened 10 years ago

ringprince commented 10 years ago

Hi,

I have an issue with mdb.get (or csv.get). Interestingly only with R version 3.1.0 and not with R version 3.0.0.

The installed versions of Hmisc are: R 3.1.0: Hmisc_3.14-4 R 3.0.0: Hmisc_3.10-1

My R session is:

library("Hmisc")
tmp <- mdb.get("some.mdb")

This works with R 3.0.0, but not with R 3.1.0 (I tried both version of Hmisc with the same result):

Error in `label<-.default`(`*tmp*`, value = NULL) :
  value must be character vector of length 1
> traceback()
6: stop("value must be character vector of length 1")
5: `label<-.default`(`*tmp*`, value = NULL)
4: `label<-`(`*tmp*`, value = NULL)
3: cleanup.import(w, labels = if (length(labels)) labels else if (changed) n else NULL,
       datevars = datevars, datetimevars = datetimevars, dateformat = dateformat,
       fixdates = fixdates, charfactor = charfactor)
2: csv.get(f, datetimevars = datetime, lowernames = lowernames,
       allow = allow, dateformat = dateformat, ...)
1: mdb.get("some.mdb")

What is going on here? Is this an encoding issue?

All is happening on the same machine running ubuntu 12.04.2 LTS.

Please let me know if you need more information. I cannot post 'some.mdb' but I could send it privately, if necessary.

Many thanks in advance!

harrelfe commented 9 years ago

I'm sorry I didn't get to this earlier. I ran the function on my test database and found another problem due to mdb-export producing binary output on occasion. I added the -b strip option to the call to mdb-export and just committed the code. Let me know if you still have a problem.

ringprince commented 9 years ago

This has not improved for me. The -b option seems to be not supported? I've installed the latest version from github via devtools.

This is what I get now:

> library("Hmisc")
Loading required package: grid
Loading required package: lattice
Loading required package: survival
Loading required package: splines
Loading required package: Formula

Attaching package: ‘Hmisc’

The following objects are masked from ‘package:base’:

    format.pval, round.POSIXt, trunc.POSIXt, units

> tmp <- mdb.get("MFGTMP-PC_141007150001.mdb")
mdb-export: invalid option -- 'b'
Can't alloc filename
Error in read.table(file = file, header = header, sep = sep, quote = quote,  :
  no lines available in input
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines   grid      stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] Hmisc_3.14-6    Formula_1.1-2   survival_2.37-7 lattice_0.20-29

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3     cluster_1.15.2      foreign_0.8-61
[4] latticeExtra_0.6-26 nnet_7.3-8          RColorBrewer_1.0-5
[7] rpart_4.1-8         tools_3.1.0
harrelfe commented 9 years ago

What version of the mdbtools system package are you using?

On 11/18/2014 06:42 PM, ringprince wrote:

This has not improved for me. The -b option seems to be not supported? I've installed the latest version from github via devtools.

This is what I get now:

|> library("Hmisc") Loading required package: grid Loading required package: lattice Loading required package: survival Loading required package: splines Loading required package: Formula

Attaching package: ‘Hmisc’

The following objects are masked from ‘package:base’:

 format.pval, round.POSIXt, trunc.POSIXt, units

tmp <- mdb.get("MFGTMP-PC_141007150001.mdb") mdb-export: invalid option -- 'b' Can't alloc filename Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-unknown-linux-gnu (64-bit)

locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=C [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] splines grid stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] Hmisc_3.14-6 Formula_1.1-2 survival_2.37-7 lattice_0.20-29

loaded via a namespace (and not attached): [1] acepack_1.3-3.3 cluster_1.15.2 foreign_0.8-61 [4] latticeExtra_0.6-26 nnet_7.3-8 RColorBrewer_1.0-5 [7] rpart_4.1-8 tools_3.1.0 |

— Reply to this email directly or view it on GitHub https://github.com/harrelfe/Hmisc/issues/14#issuecomment-63573622.


Frank E Harrell Jr Professor and Chairman School of Medicine

Department of *Biostatistics*   *Vanderbilt University*
ringprince commented 9 years ago

I am not sure about what upstream version that corresponds to, but my mdbtools come from mdbtools_0.7~rc1-4_amd64.deb.

harrelfe commented 9 years ago

I'm using mdbtools_0.7.1-1ubuntu1_amd64.deb so I don't see how there could be a problem with -b strip option to mdb-export system command. I just ran my test mdb database through a new version of Hmisc with the updated function and it worked fine.

On 11/18/2014 07:29 PM, ringprince wrote:

I am not sure about what upstream version that corresponds to, but my mdbtools come from mdbtools_0.7~rc1-4_amd64.deb.

— Reply to this email directly or view it on GitHub https://github.com/harrelfe/Hmisc/issues/14#issuecomment-63577909.


Frank E Harrell Jr Professor and Chairman School of Medicine

Department of *Biostatistics*   *Vanderbilt University*
ringprince commented 9 years ago

On the system command line I get no mentioning of an '-b' option:

$ mdb-export
Usage: mdb-export [options] <file> <table>
where options are:
  -H             supress header row
  -Q             don't wrap text-like fields in quotes
  -d <delimiter> specify a column delimiter
  -R <delimiter> specify a row delimiter
  -I <backend>   INSERT statements (instead of CSV)
  -D <format>    set the date format (see strftime(3) for details)
  -q <char>      Use <char> to wrap text-like fields. Default is ".
  -X <char>      Use <char> to escape quoted characters within a field. Default is doubling.
  -N <namespace> Prefix identifiers with namespace

So, I guess its time to try the next mdbtools.

ringprince commented 9 years ago

Indeed: the version in debian stable does not have the '-b' option. The version in the now frozen debian testing (0.7.1-2) does have that option. After upgrading both mdbtools and libmdb2 to the testing versions, the invalid option -- 'b' error is gone. Now, I am again left with the original error.

The file generating the error is still sitting on my system. If that is of any help, I can make that available for you privately.

harrelfe commented 9 years ago

I'm glad the updating of mdbtools got past that problem. If you can securely deposit an mdb file that fails for you I can debug. Go to https://data.vanderbilt.edu/data-hippo/ and use this email address: f.harrell@vanderbilt.edu . If file is not sensitive and not too large you can just email as an attachment.

On 11/19/2014 03:39 AM, ringprince wrote:

Indeed: the version in debian stable does not have the '-b' option. The version in the now frozen debian testing (0.7.1-2) does have that option. After upgrading both mdbtools and libmdb2 to the testing versions, the |invalid option -- 'b'| error is gone. Now, I am again left with the original error.

— Reply to this email directly or view it on GitHub https://github.com/harrelfe/Hmisc/issues/14#issuecomment-63613889.


Frank E Harrell Jr Professor and Chairman School of Medicine

Department of *Biostatistics*   *Vanderbilt University*
ringprince commented 9 years ago

I have uploaded the file, which is too big to attach which I cannot share publicly. Thanks for looking into this.

harrelfe commented 9 years ago

As a check was your .mdb file 182554624 bytes?

I ran mdb.get on it on the latest test version of Hmisc, on Xubuntu 14.10, and everything ran fine. The resulting R object containing all the dataframes is 13MB.

On 11/19/2014 03:03 PM, ringprince wrote:

I have uploaded the file, which is too big to attach which I cannot share publicly. Thanks for looking into this.

— Reply to this email directly or view it on GitHub https://github.com/harrelfe/Hmisc/issues/14#issuecomment-63712971.


Frank E Harrell Jr Professor and Chairman School of Medicine

Department of *Biostatistics*   *Vanderbilt University*
ringprince commented 9 years ago

Yes, the file size is correct.

I still get the issue

System R status
Ubuntu 12.04.2 LTS R 3.0.0 fine
OS X 10.9 R 3.1.2 fine
Ubuntu 12.04.2 LTS R 3.1.0 fail

Maybe the problem is specific to R 3.1.0 ?

Here is the complete output of the last combination again:

R version 3.1.0 (2014-04-10) -- "Spring Dance"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> library("Hmisc")
Loading required package: grid
Loading required package: lattice
Loading required package: survival
Loading required package: splines
Loading required package: Formula

Attaching package: ‘Hmisc’

The following objects are masked from ‘package:base’:

    format.pval, round.POSIXt, trunc.POSIXt, units

> mdb.get("MFGTMP-PC_141007150001.mdb")
Error in `label<-.default`(`*tmp*`, value = NULL) :
  value must be character vector of length 1
> traceback()
6: stop("value must be character vector of length 1")
5: `label<-.default`(`*tmp*`, value = NULL)
4: `label<-`(`*tmp*`, value = NULL)
3: cleanup.import(w, labels = if (length(labels)) labels else if (changed) n else NULL,
       datevars = datevars, datetimevars = datetimevars, dateformat = dateformat,
       fixdates = fixdates, charfactor = charfactor)
2: csv.get(f, datetimevars = datetime, lowernames = lowernames,
       allow = allow, dateformat = dateformat, ...)
1: mdb.get("MFGTMP-PC_141007150001.mdb")
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines   grid      stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] Hmisc_3.14-6    Formula_1.1-2   survival_2.37-7 lattice_0.20-29

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3     cluster_1.15.2      foreign_0.8-61
[4] latticeExtra_0.6-26 nnet_7.3-8          RColorBrewer_1.0-5
[7] rpart_4.1-8         tools_3.1.0
ringprince commented 9 years ago

I have now compiled R 3.1.2 on the Ubuntu machine as well and it works. So, I assume it is an issue specific to R 3.1.0.

harrelfe commented 9 years ago

I can't imaging why this is R related but glad you got it to work. I'm using the latest production version of R (3.1.1) and it works; no need to compile R.

On 11/19/2014 07:44 PM, ringprince wrote:

I have now compiled R 3.1.2 on the Ubuntu machine as well and it works. So, I assume it is an issue specific to R 3.1.0.

— Reply to this email directly or view it on GitHub https://github.com/harrelfe/Hmisc/issues/14#issuecomment-63748264.


Frank E Harrell Jr Professor and Chairman School of Medicine

Department of *Biostatistics*   *Vanderbilt University*