NER error after loading a CONLL-U document: doc.text is None

I get the following error when running NER: TypeError: 'NoneType' object is not subscriptable

After debugging the error, I found out that it is trying to access the document's text attribute, but it is empty (None). I'm loading the document from a CONLL-U file created using Stanza, with the function stanza.utils.conll.conll2doc. So it seems loaded documents don't get their text attribute set. Each sentence has their text, but not the main document, which Stanza is trying to access in order to create the entity spans.

Is it possible to build the document's text from the sentences? That would fix the problem, I guess.

This is the entire stack trace:

Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/zbeloki/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 71, in cli.main() File "/home/zbeloki/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main run() File "/home/zbeloki/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file runpy.run_path(target, run_name="main") File "/home/zbeloki/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname) File "/home/zbeloki/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code _run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name) File "/home/zbeloki/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code exec(code, run_globals) File "stanza/prepare_eval_data.py", line 59, in main(args) File "stanza/prepare_eval_data.py", line 30, in main doc = nlp(doc_tokenized) File "/home/zbeloki/workspace/nlp_processors_evaluation/venv/lib/python3.10/site-packages/stanza/pipeline/core.py", line 480, in call return self.process(doc, processors) File "/home/zbeloki/workspace/nlp_processors_evaluation/venv/lib/python3.10/site-packages/stanza/pipeline/core.py", line 431, in process doc = process(doc) File "/home/zbeloki/workspace/nlp_processors_evaluation/venv/lib/python3.10/site-packages/stanza/pipeline/ner_processor.py", line 123, in process total = len(batch.doc.build_ents()) File "/home/zbeloki/workspace/nlp_processors_evaluation/venv/lib/python3.10/site-packages/stanza/models/common/doc.py", line 433, in build_ents s_ents = s.build_ents() File "/home/zbeloki/workspace/nlp_processors_evaluation/venv/lib/python3.10/site-packages/stanza/models/common/doc.py", line 752, in build_ents self.ents.append(Span(tokens=ent_tokens, type=e['type'], doc=self.doc, sent=self)) File "/home/zbeloki/workspace/nlp_processors_evaluation/venv/lib/python3.10/site-packages/stanza/models/common/doc.py", line 1601, in init self.init_from_tokens(tokens, type) File "/home/zbeloki/workspace/nlp_processors_evaluation/venv/lib/python3.10/site-packages/stanza/models/common/doc.py", line 1618, in init_from_tokens self.text = self.doc.text[self.start_char:self.end_char] TypeError: 'NoneType' object is not subscriptable

stanfordnlp / stanza

NER error after loading a CONLL-U document: doc.text is None #1428