qurator-spk / sbb_pixelwise_segmentation

Pixelwise segmentation for document images
Apache License 2.0
10 stars 10 forks source link

`tf.keras` version that allows any input resolution and doesn't use `Lambda` layers #16

Open prhbrt opened 9 months ago

prhbrt commented 9 months ago

Since the UNET architecture only uses layers that can scale with the image dimensions, the fixed dimensions seem artificial. I've added a zero-padding layer that increases the dimensions to the nearest multiple of 32. The padding is cut off in the end.

Moreover, I've removed the lambda-layer, as it creates marshaling warning, and used ZeroPadding2D's asymmetric padding feature. This removes a warning upon load_model.

I converted the eynollah-models to this architecture, and they should load without warnings now and use the tensorflow.keras API and can be found here.

This might allow you to skip the patching as used in Eynollah and speed up the whole process. Please let me know what you think.

Notes and sanity checks:

cneud commented 9 months ago

Hi @prhbrt, thanks a lot for looking into this and for contributing!

FYI, we are planning to update and refactor this repo and integrate the model training code with https://github.com/qurator-spk/eynollah for future maintenance, so this comes in very handy.

My colleagues @vahidrezanezhad and @michalbubula will be working on this - although due to various reasons, we likely won't be able to get our hands dirty much before March. But we will try our best to review and merge any contributions also beforehand.

Btw the two models that you were missing should be available from our HF:

prhbrt commented 9 months ago

@cneud Understood! Could you in the meantime provide a list of all model-architectures used for eynollah (specifically python code)? I couldn't find python-code for the column classifier in particular, so that one still has fixed dimensions.

Also note that this yolo-version might give slightly different outputs as your patching example, due to boundary conditions.

Thank you in advance!

vahidrezanezhad commented 9 months ago

@cneud Understood! Could you in the meantime provide a list of all model-architectures used for eynollah (specifically python code)? I couldn't find python-code for the column classifier in particular, so that one still has fixed dimensions.

Also note that this yolo-version might give slightly different outputs as your patching example, due to boundary conditions.

Thank you in advance!

@prhbrt
Sure, I'll try to make classifier code public in the meantime.