nleguillarme / taxonerd

TaxoNERD : recognizing taxonomic entities using deep models
MIT License
39 stars 8 forks source link

errors using gbif_backbone entity linker #17

Open mpoelchau opened 1 year ago

mpoelchau commented 1 year ago

Thanks for publishing a really useful resource! I've used the python version successfully with the NCBI entity linker, but when I use the gbif backbone on the same dataset I get the stack trace below. Any pointers? I'm using python 3.9.2

$ taxonerd ask -m en_core_eco_biobert -l gbif_backbone -i reports/ -o reports/test_ann_gbif
Your CPU supports instructions that this binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2
For maximum performance, you can install NMSLIB from sources 
pip install --no-binary :all: nmslib
Traceback (most recent call last):
  File "/project/nal_genomics/mpoelchau/taxonerd-env/bin/taxonerd", line 8, in <module>
    sys.exit(main())
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/cli.py", line 111, in main
    cli()
  File "/apps/python-3.9.2/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/apps/python-3.9.2/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/apps/python-3.9.2/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/apps/python-3.9.2/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/apps/python-3.9.2/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/cli.py", line 84, in ask
    nerd.load(ner_model, exclude=exclude, linker=link_to, threshold=thresh)
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/taxonerd.py", line 68, in load
    self.nlp.add_pipe(
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/spacy/language.py", line 801, in add_pipe
    pipe_component = self.create_pipe(
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/spacy/language.py", line 680, in create_pipe
    resolved = registry.resolve(cfg, validate=validate)
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/confection/__init__.py", line 728, in resolve
    resolved, _ = cls._make(
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/confection/__init__.py", line 777, in _make
    filled, _, resolved = cls._fill(
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/confection/__init__.py", line 849, in _fill
    getter_result = getter(*args, **kwargs)
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/linking/linking.py", line 83, in __init__
    self.candidate_generator = candidate_generator or CandidateGenerator(
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/linking/candidate_generation.py", line 259, in __init__
    self.kb = kb or KnowledgeBaseFactory().get_kb(name)
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/linking/linking_utils.py", line 158, in get_kb
    return GbifKnowledgeBase()
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/linking/linking_utils.py", line 178, in __init__
    super().__init__(file_path, prefix)
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/linking/linking_utils.py", line 86, in __init__
    self.conn = self.json_to_sqlite(file_path, db_path)
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/linking/linking_utils.py", line 99, in json_to_sqlite
    for concept in raw:
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/linking/linking_utils.py", line 92, in <genexpr>
    raw = (json.loads(line) for line in open(cached_path(file_path)))
  File "/apps/python-3.9.2/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/apps/python-3.9.2/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/apps/python-3.9.2/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 145 (char 144)
nleguillarme commented 1 year ago

Hi @mpoelchau, thank you for using TaxoNERD. I've just tried with a fresh install of TaxoNERD (python 3.9.17 and model en_core_eco_md, but I don't think it has anything to do with the python version or the model), and it works just fine. Could you send me an example of text that causes the error please?