moj-analytical-services / splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
https://moj-analytical-services.github.io/splink/
MIT License
1.4k stars 151 forks source link

ModuleNotFoundError: No module named "pyarrow" #2499

Closed lubrst closed 3 weeks ago

lubrst commented 3 weeks ago

What happens?

Lines 76 triggers a ModuleNotFoundError if "pyarrow" is not installed.

https://github.com/moj-analytical-services/splink/blob/77c96b819c76060aafc03c4a1786841a560fb3c5/splink/internals/duckdb/database_api.py#L70-L86

It seems like line 76 could be removed, because the same import follows in the try-exepct clause.

To Reproduce

Use duckDB as database.

OS:

Linux

Splink version:

4.0.5

Have you tried this on the latest master branch?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

RobinL commented 3 weeks ago

Good spot, thanks. Feel free to PR or we'll fix this soon