jbryer / ipeds

R R package for accessing the Integrated Postsecondary Education Data System (IPEDS) from R.
48 stars 18 forks source link

Hmisc::mdb.get not available on Windows #9

Open mikesmith2468 opened 5 years ago

mikesmith2468 commented 5 years ago

With the absence of the Hmisc::mdb.get function on Windows, it appears the only supported way to connect to Access databases on Windows is via ODBC. I've managed to get this working via RODBC (without needing to create the DSNs via the Windows administrative tools) but transferring the data from the Access file into R is unworkably slow for the larger tables. I got bored of waiting after 24 hours for one of them, on a reasonably capable machine. It also has significant implementation challenges depending on the "bit-ness" of both the R install and the Access install.

Instead I have a helper function that I'm testing that uses getIPEDSSurvey() to pull down all the zipped CSVs from NCES for a given year, extracts them and imports them into R individually, and then saves them all as an Rds file. My hope is that this could take the place of the download_ipeds() function on Windows machines only: as far as I have been able to tell, the resulting Rds file looks identical to the one that comes from Hmisc::mdb.get, and so your existing functions for working with the data in the Rds should not need to be changed. However, it does rely on the continued existence of the getIPEDSSurvey() function.

Thoughts?

-m.

jbryer commented 5 years ago

This seems like a fine work around. I have not intention of removing the the getIPEDSurvey anytime soon. As long as it still works, I can keep it. If this turns out to be the only viable solution for Windows, I can remove the warning message. It is unfortunate that government agencies seem to like to distribute their data in proprietary formats. NYS Dept of Education also releases their data (only) in MS Access.

On Tue, Aug 27, 2019 at 1:53 PM mikesmith2468 notifications@github.com wrote:

With the absence of the Hmisc::mdb.get function on Windows, it appears the only supported way to connect to Access databases on Windows is via ODBC. I've managed to get this working via RODBC (without needing to create the DSNs via the Windows administrative tools) but transferring the data from the Access file into R is unworkably slow for the larger tables. I got bored of waiting after 24 hours for one of them, on a reasonably capable machine. It also has significant implementation challenges depending on the "bit-ness" of both the R install and the Access install.

Instead I have a helper function that I'm testing that uses getIPEDSSurvey() to pull down all the zipped CSVs from NCES for a given year, extracts them and imports them into R individually, and then saves them all as an Rds file. My hope is that this could take the place of the download_ipeds() function on Windows machines only: as far as I have been able to tell, the resulting Rds file looks identical to the one that comes from Hmisc::mdb.get, and so your existing functions for working with the data in the Rds should not need to be changed. However, it does rely on the continued existence of the getIPEDSSurvey() function.

Thoughts?

-m.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jbryer/ipeds/issues/9?email_source=notifications&email_token=AAEYJD5OIZYBAAMHRN7YJWDQGVSYNA5CNFSM4IQHRFB2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HHW7SZQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEYJD7BWOD4C2G6UTE7CPLQGVSYNANCNFSM4IQHRFBQ .

mikesmith2468 commented 5 years ago

Indeed. It's also a little ironic that getting to the data is straightforward on non-MS platforms but next to impossible on Windows... Anyway, I'll keep testing and I should have something for you in the next few days.

jbryer commented 5 years ago

Sounds good. There is another bug with Windows I haven't been able to track down yet. I don't currently have access to a Windows computer, but with the semester starting I might.

On Tue, Aug 27, 2019 at 4:25 PM mikesmith2468 notifications@github.com wrote:

Indeed. It's also a little ironic that getting to the data is straightforward on non-MS platforms but next to impossible on Windows... Anyway, I'll keep testing and I should have something for you in the next few days.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jbryer/ipeds/issues/9?email_source=notifications&email_token=AAEYJDZ6L2VAOJRL3GORJULQGWEURA5CNFSM4IQHRFB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5JAGYY#issuecomment-525468515, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEYJD2BZBN2SZXUCBSWQUTQGWEURANCNFSM4IQHRFBQ .

mikesmith2468 commented 5 years ago

Yeah, that one is down to the trailing '/' in the exdir parameter to unzip(); for some reason you have to remove that on Windows, whereas other platforms don't care whether it's there or not. I think it's a known issue.

For your package, I think it's not a problem -- it only shows up when trying to unzip the Access file, and that's not going to lead to anything usable on Windows anyway. My workaround uses getIPEDSSurvey, which doesn't suffer from the same issue. So I think we can probably fix both the Windows issues at once.

tyleransom commented 4 years ago

@mikesmith2468 Do you happen to have a gist (or fork) that shows this functionality? I am a Windows user experiencing all of these known issues.

jbryer commented 4 years ago

I pushed an update to Github that may fix this. Can you verify @tyleransom

Also, I tried to find an alternative way to use the MS Access files (the new way IPEDS provides data) on Windows. I have a few lines of code at the bottom of this file: https://github.com/jbryer/ipeds/blob/master/ipeds.dev.R The bad news is I cannot get this to work. I suspect it has to do with the file bing 32 bit and I was trying to run this on 64 bit Windows. If someone can figure this out, let me know as I don't have time nor desire to figure out a weird Windows issue.

tyleransom commented 4 years ago

Thanks, @jbryer I will give it a shot and see if I can figure it out.

mikesmith2468 commented 4 years ago

I made it work (by spawning a 32-bit R instance and connecting through there) but it's too slow to be usable.

jbryer commented 4 years ago

Were you on 64 bit version of Windows? I tried switching R to 32-bit in Rstudio but it didn't work for me.

On Fri, Nov 8, 2019 at 2:12 PM Michael Smith notifications@github.com wrote:

I made it work (by spawning a 32-bit R instance and connecting through there) but it's too slow to be usable.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbryer/ipeds/issues/9?email_source=notifications&email_token=AAEYJD3HC6AX4QYHZHEQE4LQSW2YDA5CNFSM4IQHRFB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDTCNFY#issuecomment-551954071, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEYJD2TQIYUXTRR2YB4EV3QSW2YDANCNFSM4IQHRFBQ .

mikesmith2468 commented 4 years ago

I was on 64-bit, yes.

Here's the code I was experimenting with -- I think I found it on Stack Overflow or somewhere like that. It seems to work after a fashion, but I found it unworkably slow as I said earlier. Could be something weird in my setup, though. access_query_32.zip