lfurrer / bconv

Python library for converting between BioNLP formats
MIT License
20 stars 3 forks source link

Key error while exporting CHEMDNER to CONLL #4

Closed navanchauhan closed 3 years ago

navanchauhan commented 3 years ago
# wget --no-check-certificate https://biocreative.bioinformatics.udel.edu/media/store/files/2014/chemdner_corpus.tar.gz
import bconv
coll = bconv.load('chemdner_corpus/training.bioc.xml', fmt='bioc_xml', byte_offsets=False)
with open('t1.conll', 'w') as f:
  bconv.dump(coll, f,fmt="conll")

Error dump

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-54-05b63343e33b> in <module>()
      1 with open('t1.conll', 'w') as f:
----> 2   bconv.dump(coll, f,fmt="conll")

6 frames
/usr/local/lib/python3.6/dist-packages/bconv/fmt/conll.py in <genexpr>(.0)
    204         with self._entities_by_pos(sentence.entities) as entities:
    205             for token in sentence:
--> 206                 label = ';'.join(e.info[self.label] for e in entities.send(token))
    207                 yield label
    208 

KeyError: 'type'
lfurrer commented 3 years ago

https://github.com/lfurrer/bconv/wiki/CoNLL#exporters

Specifically, have a look at the label option. You'll want to set it to "class" with this corpus.

navanchauhan commented 3 years ago

Ahh, I don't know how I missed that page of the documentation. Thank you

lfurrer commented 3 years ago

I admit that the error message isn't very helpful. I'll push a fix for that later.