mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
688 stars 125 forks source link

regression: no model for tags type=default #564

Closed bertsky closed 6 months ago

bertsky commented 6 months ago

In trying to reproduce https://github.com/mittagessen/kraken/issues/525, I encountered this:

kraken --input 1694884104_0010.jpg 1694884104_0010.txt segment --baseline ocr --model $XDG_DATA_HOME/ocrd-resources/ocrd-kraken-recognize/austriannewspapers.mlmodel
Loading ANN /data/ocr-d/kraken/kraken/blla.mlmodel  ✓
Loading ANN /data/ocr-d/ocrd_all/venv38/share/ocrd-resources/ocrd-kraken-recognize/austriannewspapers.mlmodel   ✓
Segmenting  ✓
Processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/81 -:--:-- 0:00:00
[01/17/24 11:27:07] ERROR    Failed processing 1694884104_0010.jpg: No model for tags {'type': 'default'}                       kraken.py:426

Looks like the cause is somewhere in 8ff27d1e76ca915cf995bcf556b377ab34a32ef9.

bertsky commented 6 months ago

But if I switch the branching of the conditional:

--- a/kraken/rpred.py
+++ b/kraken/rpred.py
@@ -336,10 +336,10 @@ def _resolve_tags_to_model(tags: Optional[Sequence[Dict[str, str]]],
     """
     if not tags and default:
         return ('type', 'default'), default
+    elif tags and default:
+        return next(tags.values()), default
     elif tags:
         for tag in tags.items():
             if tag in model_map:
                 return tag, model_map[tag]
-    elif tags and default:
-        return next(tags.values()), default
     raise KrakenInputException(f'No model for tags {tags}')

then I get

'dict_values' object is not an iterator

so the moved line also needs an extra iter(...).

But then, still, I get lots of segmentation errors:

Tensor conversion failed with 'default'. Emitting empty record.

and the text result is empty.

bertsky commented 6 months ago

thx!

mittagessen commented 6 months ago

On 24/01/17 04:15AM, Robert Sachunsky wrote:

But if I switch the branching of the conditional:

--- a/kraken/rpred.py
+++ b/kraken/rpred.py
@@ -336,10 +336,10 @@ def _resolve_tags_to_model(tags: Optional[Sequence[Dict[str, str]]],
     """
     if not tags and default:
         return ('type', 'default'), default
+    elif tags and default:
+        return next(tags.values()), default
     elif tags:
         for tag in tags.items():
             if tag in model_map:
                 return tag, model_map[tag]
-    elif tags and default:
-        return next(tags.values()), default
     raise KrakenInputException(f'No model for tags {tags}')

I've fixed it in main (your change isn't correct as the last check should run if the elif tags branch falls through).

But then, still, I get lots of segmentation errors:

Tensor conversion failed with 'default'. Emitting empty record.

and the text result is empty.

Also fixed in main. I've switched from using only the value of tags to (key, value) pairs so the next(iter(... part was also incorrect.

PS: Regarding the scikit-image thing. I was mistaken, newer scikit-image up to 0.21.x works with scipy 0.10.x as the simplices attribute existed concurrently with the removed vertices one. Current main has the requisite patches for py38 compatibility.