harrelfe / Hmisc

Harrell Miscellaneous
Other
208 stars 81 forks source link

Fix for mdb.get() on Windows #135

Closed rhurlin closed 3 years ago

rhurlin commented 4 years ago

mdb.get() does not work as expected on Windows

Reading Microsoft Access database files via mdb.get() works fine for me on FreeBSD without any problems for years now. Many thanks for the nice package!

Now my team at work also has to read mdb files directly (without an odbc connection) on Windows 10. Reading mdb files and tables directly on the command line works fine with installed MDBTools [1] under Windows 10 (64 bit), i.e. mdb-export -b strip d:\path\to\mdb.file TABLE. So the needed underlying tools are present and usable.

In a next step we tried the same R scripts we are using on Unix to read the mdb files. It turns out, that mdb.get() from Hmisc is not able to read tables from the mdb files under Windows 10 at all(?). In all our trials, table queries returned without reading in any data.

After investigating a bit deeper with debugger tools, I think I was able to isolate the cause: The function mdb.get() in Hmisc/R/mdb.get.s uses system() at line 25 to put a complete string to the windows command line via:

system(paste("mdb-export", mdbexportArgs, file, shQuote(tab), '>', f))

In our case, this was something like

system("mdb-export -b strip ./data/db.mdb TABLE > c:/Users/USERNAME/AppData/Local/Rwrkdir/file-tmpxasdflknc78a2")

gives the error message

Wrong number of arguments.

Usage:
  MDB-EX~1.EXE [OPTION…] <file> <table> - export data from MDB file

Help Options:
  -h, --help                    Show help options

Application Options:
  -H, --no-header               Suppress header row.
  -Q, --no-quote                Don't wrap text-like fields in quotes.
  -d, --delimiter=char          Specify an alternative column delimiter. Default is comma.
  -R, --row-delimiter=char      Specify a row delimiter
  -q, --quote=char              Use <char> to wrap text-like fields. Default is double quote.
  -I, --backend=backend         INSERT statements (instead of CSV)
  -D, --date_format=format      Set the date format (see strftime(3) for details)
  -X, --escape=format           Use <char> to escape quoted characters within a field. Default is doubling.
  -N, --namespace=namespace     Prefix identifiers with namespace
  -0, --null=char               Use <char> to represent a NULL value
  -b, --bin=strip|raw|octal     Binary export mode
  -B, --boolean-words           Use TRUE/FALSE in Boolean fields (default is 0/1)

Obviously, there is something wrong with the last part of the string. The part with the pipe > (bigger sign) and the path to the tempfile are interpreted as additional parameters to the mdb-export command instead of a pipe (or redirect) to the command line.

This happens, because system() is not very good in handling piping or redirection, see more [2]. For this, system2() is a much better choice [3]. So I changed line 25f in mdb.get.s to use system2() instead and it works like expected now:

system2(command = 'mdb-export', args = paste(mdbexportArgs, file, shQuote(tab)), stdout = f)

This also works on Unix-likes and has been tested for some time now. I would expect it also to work on macos, but had no possibility to test.

[1] https://github.com/lsgunth/mdbtools-win [2] https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/system [3] https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/system2

Hint towards MDBTools on Windows

I would like to suggest another proposal, since I am wondering, if the presence of a working MDB toolset on Windows, usable by Hmisc, is well known enough. Perhaps it would be a good idea to point out about it in the vignette/documentation of the Hmisc package?

Something like the following has helped our team members:

Thanks to Logan Gunthorpe, on Github there is a MDBTools version 0.7-220-gd6f5745 from Dec, 12th 2017, using MSYS2 for 32bit Windows. This also works fine on Windows 10 64bit:

https://github.com/lsgunth/mdbtools-win

In this repository, there are no sources, only binaries and needed dll files. You will find the following executables:

mdb-array.exe, mdb-export.exe, mdb-header.exe, mdb-import.exe, mdb-parsecsv.exe, mdb-prop.exe, mdb-schema.exe, mdb-sql.exe, mdb-tables.exe, mdb-ver.exe, prcat.exe, prdata.exe, prdump.exe, prindex.exe, prkkd.exe, prole.exe, prtable.exe, updrow.exe

It is pretty easy as a normal Windows user to download the repository as a zip file from

https://github.com/lsgunth/mdbtools-win/archive/master.zip

and afterwards extract the contents of the zip file into a user accessable directory, i.e.

C:\Users\USERNAME\AppData\Local\MDBTools

'USERNAME' should be replaced by your Windows accounts user name.

Finally you have to add the MDBTools path to your user environment by executing the following on the windows command line:

pathman /au c:\Users\USERNAME\AppData\Local\MDBTools

Now commands like mdb-export should be reachable by other software like mdb.get() from R package Hmisc.