This PR fixes a spaCy v3.x incompatibility with the viewer module. The fix proposed here generalizes to spaCy v2.3.x as well as v3.x. This should allow current/new users to perform clean installs of checklist>=0.0.10 in the future, using either spaCy 2.3.x or spaCy 3.x and not have an issue with visualizing their results.
Steps to reproduce the issue
Do a clean install of checklist==0.0.10 (latest version) in a virtual environment
git clone git@github.com:marcotcr/checklist.git
cd checklist
pip install -e .
This installs spaCy 3.x (because setup.py allows spacy>2.2). However, the way the tokenizer is defined for the viewer module right now, the visualization of any checklist workflow in a jupyter notebook fails with the below error:
~/test_checklist_error/.env/lib/python3.8/site-packages/checklist/viewer/test_summarizer.py in __init__(self, test_summary, testcases, **kwargs)
33 nlp = English()
34 # ONLY do tokenization here
---> 35 self.tokenizer = nlp.Defaults.create_tokenizer(nlp)
36
37 self.max_return = 10
AttributeError: type object 'EnglishDefaults' has no attribute 'create_tokenizer'
By visualization, I mean the visual summary table from the test suite.
from checklist.test_suite import TestSuite
suite = TestSuite()
# ...
suite.run(predictor_fn, overwrite=True)
suite.visual_summary_table()
Fix
To address this, we can replace the nlp.Defaults.create_tokenizer method and point to the more generic property nlp.tokenizer instead (as per the spaCy docs).
I've issued the fix in my fork, which can be used for testing purposes.
git clone git@github.com:prrao87/checklist.git
cd checklist
pip install -e .
This results in successful execution of a Checklist workflow (including tokenization, MFTs, INV and DIR tests), regardless of whether spaCy 2.3.x or 3.x is installed.
Tests
I tested the fix on a complete end-to-end Checklist (0.0.10) workflow with both spaCy 2.3.2 and 3.0.5 (the latest release) and the tokenizer works as intended in both versions. The visualizer doesn't complain and everything checks out fine.
Purpose
This PR fixes a spaCy v3.x incompatibility with the viewer module. The fix proposed here generalizes to spaCy v2.3.x as well as v3.x. This should allow current/new users to perform clean installs of
checklist>=0.0.10
in the future, using either spaCy 2.3.x or spaCy 3.x and not have an issue with visualizing their results.Steps to reproduce the issue
Do a clean install of
checklist==0.0.10
(latest version) in a virtual environmentThis installs spaCy 3.x (because
setup.py
allowsspacy>2.2
). However, the way the tokenizer is defined for the viewer module right now, the visualization of any checklist workflow in a jupyter notebook fails with the below error:By visualization, I mean the visual summary table from the test suite.
Fix
To address this, we can replace the
nlp.Defaults.create_tokenizer
method and point to the more generic propertynlp.tokenizer
instead (as per the spaCy docs).I've issued the fix in my fork, which can be used for testing purposes.
This results in successful execution of a Checklist workflow (including tokenization, MFTs, INV and DIR tests), regardless of whether spaCy 2.3.x or 3.x is installed.
Tests
I tested the fix on a complete end-to-end Checklist (
0.0.10
) workflow with both spaCy 2.3.2 and 3.0.5 (the latest release) and the tokenizer works as intended in both versions. The visualizer doesn't complain and everything checks out fine.Hope this works!