moj-analytical-services / splink_demos

Interactive notebooks containing demonstration code of the splink library
38 stars 27 forks source link

ModuleNotFoundError: No module named 'splink.duckdb' #54

Closed msiemionCalistapw closed 2 years ago

msiemionCalistapw commented 2 years ago

Hi Robin,

Very excited about your Splink code! I tried to run the Quickstart code in jupyter notebook and ran into a no module error for the following line of code:

from splink.duckdb.duckdb_linker import DuckDBLinker from splink.duckdb.duckdb_comparison_library import ( exact_match, levenshtein_at_thresholds,

.....

Please let me know if you can help with the setup of splink modules for record linkage.

Thanks,

Michayla

ThomasHepworth commented 2 years ago

Hiya đź‘‹

If you just want to play around with splink and our demo examples, you can use the binder link which will boot up a jupyterlab window w/ the required modules:

Binder

If you particularly want to run splink locally in jupyter, it sounds like you'll need to set up a kernel. It's a bit of a pain/slightly daunting if you haven't done so before, unfortunately.

I'd need to know your OS and what you use to create virtual environments to give you a specific set of instructions.

In lieu of that, you can check out the python guide on venvs for some basic info on setting something up.

If you can get that working, you simply need to use the following in the terminal to create a kernel (renaming the kernel):

pip install ipykernel
python -m ipykernel install --user -–name=<name_of_kernel>
ThomasHepworth commented 2 years ago

Oh, I completely forgot I updated the README a while ago too.

Some info on setting up your environment can be found here.

msiemionCalistapw commented 2 years ago

Hi there, thanks for all of the info! I have created a virtual environment via anaconda prompt and successfully activated in Jupyter. The I went into myenv in Jupyter and tried the code again and am getting the same no module found error :/ I must be doing something wrong but I'm not sure what. working on windows btw!

ThomasHepworth commented 2 years ago

Are you setting your kernel inside the jupyter terminal?

Screenshot 2022-08-04 at 10 06 29

msiemionCalistapw commented 2 years ago

Sorry, I'll give you some more info! So i set up virtual invironment in anaconda prompt using: conda create -n myenv python=3.9 conda activate myenv pip install --user ipykernel python -m ipykernel install --user --name=myenv

then in Jupyter... image

envJupe
ThomasHepworth commented 2 years ago

Ah, I think I know why then, thanks. It looks like we haven't updated our conda release for over a year now, so I'm guessing it's simply down to that.

Can you check which version of splink you've currently got installed?

msiemionCalistapw commented 2 years ago

looks like 2.1.14

image

ThomasHepworth commented 2 years ago

Ah perfect, thanks for bringing this to our attention. We'll try to update it over the next week.

For now, can you pip install splink so you can use the latest version. %pip install splink -U

msiemionCalistapw commented 2 years ago

That did it! Thank you so much!

Just curious… what is the reasoning for needing the virtual environment?

ThomasHepworth commented 2 years ago

For the most part, it's just for better package management.

In short, package Management software:

This helps ensure your code is reproducible in both the present and the future. It also means projects don't get cluttered by unnecessary packages as easily.

For the jupyter case, it also means you don't have to reinstall the packages every time the kernel gets restarted.

msiemionCalistapw commented 2 years ago

makes sense, thanks!!