moj-analytical-services / splink_demos

Interactive notebooks containing demonstration code of the splink library
37 stars 27 forks source link

splink_demos

This repo contains interactive notebooks containing demonstration and tutorial for version 3 of the Splink record linking library, the homepage for which is here.

Running these notebooks interactively

You can run these notebooks in an interactive Jupyter notebook by clicking the button below:

Binder

Running these notebooks locally in VSCode

If you don't already have it, you'll need to install java on your system in order to run pyspark, which splink currently depends on. Download java for your specific OS from here.

You can check the installation went correctly by using:

java -version within a terminal instance. It should return details of your java installation.

If you have multiple java installations, you may need to change the version of java you're currently using.

To download the example notebooks, simply clone this repository:

git clone git@github.com:moj-analytical-services/splink_demos.git

Create a virtual environment using:

python3 -m venv venv
source venv/bin/activate

Install the package list (which includes pyspark) with:

pip3 install -r requirements.txt

and, if you want to use jupyter, add a kernel corresopnding to your venv:

python -m ipykernel install --user --name=splink_demos
jupyter lab