lfurrer / bconv

Python library for converting between BioNLP formats
MIT License
20 stars 3 forks source link

Bioc JSON OffsetWriter - TypeError: cannot unpack non-iterable int object #9

Closed coree closed 1 year ago

coree commented 1 year ago

Hi Lenz, First of all, thank you for this great library!

I got an error when I try to write the a BioC JSON document with byte_offsets=False.

Traceback (most recent call last):
  File "run_atm_eval.py", line 61, in <module>
    main(logger)
  File "run_atm_eval.py", line 46, in main
    miner.eval()
  File "/mnt/j/at/src/at/at.py", line 1293, in eval
    self.write_output(input_file_path)   
  File "/mnt/j/at/src/at/at.py", line 743, in write_output
    bconv.dump(self.doc, f, fmt=self.config["output"]["format"], byte_offsets=False)
  File "/home/j/at/lib/python3.7/site-packages/bconv/fmt/__init__.py", line 124, in dump
    exporter.write(content, dest)
  File "/home/j/at/lib/python3.7/site-packages/bconv/fmt/bioc.py", line 407, in write
    stream.writelines(json_iterencode(prep))
  File "/home/j/at/lib/python3.7/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/home/j/at/lib/python3.7/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/home/j/at/lib/python3.7/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/home/j/at/lib/python3.7/site-packages/bconv/util/iterate.py", line 104, in jsonable_iterator
    first = next(o)
  File "/home/j/at/lib/python3.7/site-packages/bconv/fmt/bioc.py", line 420, in <genexpr>
    ('documents', (self._document(d) for d in coll))
  File "/home/j/at/lib/python3.7/site-packages/bconv/fmt/bioc.py", line 429, in _document
    'passages': [self._passage(s, offset_mngr) for s in doc],
  File "/home/j/at/lib/python3.7/site-packages/bconv/fmt/bioc.py", line 429, in <listcomp>
    'passages': [self._passage(s, offset_mngr) for s in doc],
  File "/home/j/at/lib/python3.7/site-packages/bconv/fmt/bioc.py", line 450, in _passage
    annotations.append(self._entity(entity, offset_mngr))
  File "/home/j/at/lib/python3.7/site-packages/bconv/fmt/bioc.py", line 483, in _entity
    for start, length in offset_mngr.entity(entity)]
  File "/home/j/at/lib/python3.7/site-packages/bconv/fmt/bioc.py", line 483, in <listcomp>
    for start, length in offset_mngr.entity(entity)]
TypeError: cannot unpack non-iterable int object

I think that the return entity.start, entity.end-entity.start in OffsetWriter does not return an iterable object.

https://github.com/lfurrer/bconv/blob/f7418a8fdb772ca1b086c52e6db57a2758b82c44/bconv/fmt/bioc.py#L589 https://github.com/lfurrer/bconv/blob/f7418a8fdb772ca1b086c52e6db57a2758b82c44/bconv/fmt/bioc.py#L604

Here is a dummy version of the potential problem.

>>> def a():
...     yield 0, 2 

>>> def b():
...     return 0, 2

>>> [dict(offset=i,length=j) for i,j in a()]
[{'offset': 0, 'length': 2}]

>>> [dict(offset=i,length=j) for i,j in b()]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
TypeError: cannot unpack non-iterable int object

Or maybe I am misunderstanding something...

lfurrer commented 1 year ago

Thanks @coree for reporting this – and sorry for not noticing earlier.

It looks like you found a nasty refactoring omission. And apparently the unit tests don't cover the parameter setting byte_level=False.