kylebarron / stata_kernel

A Jupyter kernel for Stata. Works with Windows, macOS, and Linux.
https://kylebarron.dev/stata_kernel/
GNU General Public License v3.0
262 stars 55 forks source link

odbc connections #395

Open michaelewens opened 3 years ago

michaelewens commented 3 years ago

I use the odbc command to load mysql and sqlite databases. The commands work in a Stata instance, but it appears that the kernel's Stata instance does not recognize odbc list. I get the standard error related to missing ODBC drivers.

The ODBC file libiodbc.dylib could not be found on this system. Setting the unix LD_LIBRARY_PATH environment variable may correct this error.

I would not call this a bug because I continue to ask myself why I keep using Stata + SQL (it is difficult to set up with Stata on its own). How to replicate:

Is there a configuration in the kernel that ensures it sees the odbc databases available on the system in the way that Stata sees? i.e. how can I get odbc list to return the correct list?

mcaceresb commented 3 years ago

I would suspect that if you need to start jupyter with LD_LIBRARY_PATH defined?

env LD_LIBRARY_PATH=/path/to/libiodbc/folder jupyter notebook --kernel stata

(or console or qtconsole instead of notebook) it might help?

michaelewens commented 3 years ago

This helped me find part of the solution (for me, Mac OS):

  1. Figure out where you libiodbc.dylib is: find / -name libiodbc.dylib 2>&1 | grep -v "find: "
  2. Run jupyter lab --generate-config
  3. Paste in the following to that newly created file (if new!)

import os c = get_config() os.environ['LD_LIBRARY_PATH'] = 'WHERE THAT FILE IS' c.Spawner.env.update('LD_LIBRARY_PATH')'

  1. In your notebook file, make sure you have set odbcdriver ansi, permanently at the top (because Stata) and define your driver (e.g., set odbcmgr iodbc, permanently)

While this solves the odbc list issue (i.e., no more missing path or r(680) error), another weird issue emerges. How to replicate

The result: "Stata | Busy" and no reporting / results from the cell. Simply, only the first run of an odbc command works and all subsequent runs hang.

Is there some issue with the kernel talking to odbc and somehow not closing things after a first run?

mcaceresb commented 3 years ago

@michaelewens Can you post the log?

michaelewens commented 3 years ago

Spoke too soon. It works, but slowly. It is just that odbc load takes 15X longer using notebook + Stata kernel vs. native Stata (i.e., no errors). Any thoughts on what could be behind the speed difference?

mcaceresb commented 3 years ago

It's hard to say without more details (and given I've never used odbc with Stata).

When you say 15x, do you mean a difference between 1s and 15s, or more like 1 minute vs 15? If the former then maybe it's some timeout issue (where the way the kernel works is that it expects certain stuff to be printed to the log/console, and if it doesn't find it right away then it loops). That wouldn't explain it if the difference in the times is too big.

It's also possible that some internal functions of the kernel don't play nice with Stata when there is an open odbc connection, but that is harder to debug. If you set rmsg on, run your commands, and then post the log this would give a clue as to what it might be.

michaelewens commented 3 years ago

It looks like >15X. Here is the current output of a simple odbc call with the set rmsg on. After 5 minutes, no change in in the rmsg output and "Stata Busy" remains. I think it is the "playing nice" explanation. Are there other log options to see what is going on?

Screen Shot 2021-06-04 at 12 59 51
mcaceresb commented 3 years ago

The log I mean is in the stata kernel cache folder. For me it's in ~/.stata_kernel_cache

michaelewens commented 3 years ago

Here it is saved (minor modifications) while the odbc load was waiting. console_debug-sent.log

mcaceresb commented 3 years ago

@michaelewens You can try set trace on as well and look at the tail end of the log. Might give a better clue.

michaelewens commented 3 years ago

Unfortunately that does not change the behavior of the log (even after a 20 minute wait). Is there any other place where I can see the kernel activity?