ufal / lindat-kontext

An alternative web front-end for the Manatee corpus search engine
GNU General Public License v2.0
5 stars 1 forks source link

default_syntax_viewer behavior for `/` in attrs #215

Open kosarko opened 5 years ago

kosarko commented 5 years ago

@tomachalek : having this exception:

2018-11-26 14:39:16,535 [controller] ERROR: sequence index must be integer, not 'unicode'
Traceback (most recent call last):
  File "/opt/kontext/deploy/source/public/../lib/controller/__init__.py", line 798, in run
    tmpl, result = self.process_action(methodname, named_args)
  File "/opt/kontext/deploy/source/public/../lib/controller/__init__.py", line 864, in process_action
    method_ans = method(self._request)
  File "/opt/kontext/deploy/source/public/../lib/plugins/default_syntax_viewer/__init__.py", line 70, in get_syntax_data
    int(request.args.get('kwic_len')))
  File "/opt/kontext/deploy/source/public/../lib/plugins/default_syntax_viewer/__init__.py", line 87, in search_by_token_id
    data, encoder = self._backend.get_data(corp, corpname, token_id, kwic_len)
  File "/opt/kontext/deploy/source/public/../lib/plugins/default_syntax_viewer/manatee_backend.py", line 507, in get_data
    self._decode_tree_data(parsed_data, conf.parent_attr, conf.attr_refs)
  File "/opt/kontext/deploy/source/public/../lib/plugins/default_syntax_viewer/manatee_backend.py", line 483, in _decode_tree_data
    abs_parents = self._get_abs_reference(i, data[i], parent_attr)
  File "/opt/kontext/deploy/source/public/../lib/plugins/default_syntax_viewer/manatee_backend.py", line 442, in _get_abs_reference
    if item[ref_attr]:
TypeError: sequence index must be integer, not 'unicode'

which shows when you want to display certain trees. We've traced it to the item being a BackendDataParseException intstead of the "normal" data one'd expect.

The exception gets there from here: https://github.com/ufal/lindat-kontext/blob/7289e2076de7f6d8590acef7b21403f877c9eb52/lib/plugins/default_syntax_viewer/manatee_backend.py#L422, the problem is the split few lines above parsed = [import_raw_val(x) for x in in_data[i + 2].split('/')] as one of our attributes is LGloss=(zvr._zájmeno/částice); so the split produces more fields than expected.

Is it something that can be tackled just by configuration/data preparation? How are word forms or lemmata containg / handled? As these seem to be working for you somehow https://kontext.korpus.cz/view?ctxattrs=word&attr_vmode=mouseover&pagesize=40&refs=%3Ddoc.title&q=~Awk9wAk15I17&viewmode=kwic&attrs=word&corpname=syn2015&attr_allpos=all