Closed SimonGreenhill closed 4 years ago
This is the new behavior!
It is there to catch explicitly these errors.
Go to line 222 in pylexibank/providers/abvd.py
and change add_lexemes
to add_forms_from_value
.
First fix, as add_lexemes
is deprecated.
Ah, and if you think that the ?
is "normal" behavior, you can fix this also from within there, and add a line that checks if this form actually is a question makr or not. I.e., you check entry.name
and see if it is a valid entry.
Ah, I get it now, sorry for not reading properly before. There is something wrong, I guess, as the entry should not pass the clean_form
command, which is called after splitting it, and yields an empty form, which is then NOT passed to add_form
. And clean_form reacts per default specifically on ?
as a character here.
So the fix we need is in pylexibank code, dataset.py
:
def split_forms(self, item, value):
if value in self.lexemes: # pragma: no cover
self.log.debug('overriding via lexemes.csv: %r -> %r' % (value, self.lexemes[value]))
value = self.lexemes.get(value, value)
return [self.clean_form(item, form)
for form in split_text_with_context(value, separators='/,;')]
needs to be modified to:
def split_forms(self, item, value):
if value in self.lexemes: # pragma: no cover
self.log.debug('overriding via lexemes.csv: %r -> %r' % (value, self.lexemes[value]))
value = self.lexemes.get(value, value)
forms = [self.clean_form(item, form)
for form in split_text_with_context(value, separators='/,;')]
return [f for f in form if f]
Or similar. As this will only return forms that are not None
, and this is crashing the code by now.
I'd say, this is a good example, why it was good to modify the behavior of the add_lexemes
to being more transparent.
Just linked this as a bug in pylexibank.
Fixed.
... on this entry: word 57 = "?" here.
...which means that the following is passed to
add_form
:... and then we fail with
What's the best way to fix this? Should
add_form
catch this? or should this be caught before getting to add_form?