mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
750 stars 131 forks source link

MultiPolygons are no longer iterable in Shapely 2.0 #419

Closed bencomp closed 1 year ago

bencomp commented 1 year ago

I ran into a TypeError during segmentation on a fresh install of Kraken 4.2.0 via pip (on an HPC with pytorch and torchvision preinstalled).

python3.8/site-packages/kraken/lib/segmentation.py:357 in   │
│ vectorize_regions                                                                                │
│                                                                                                  │
│    354 │   if boundaries.type == 'Polygon':                                                      │
│    355 │   │   boundaries = [boundaries.boundary.simplify(10)]                                   │
│    356 │   else:                                                                                 │
│ ❱  357 │   │   boundaries = [x.boundary.simplify(10) for x in unary_union(boundaries)]           │
│    358 │   return [np.array(x.coords, dtype=np.uint)[:, [1, 0]].tolist() for x in boundaries]    │
│    359                                                                                           │
│    360                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: 'MultiPolygon' object is not iterable

I believe the cause is Multi-part geometries will no longer be “sequences” (length, iterable, indexable) in Shapely 2.0. The package requirements don't specify an upper limit on the Shapely version, so that Shapely 2.0 was installed. Downgrading to 1.8 lets me run segmentation without issues.

mittagessen commented 1 year ago

Yeah, I'm aware of it. Main is 2.x compatible and I'll tag a new release later this week (if I get everything merged and the training tests written in time.