mittagessen / seg

neural layout analysis tests.
6 stars 3 forks source link

ARU-Net #3

Closed ghost closed 6 years ago

ghost commented 6 years ago

@mittagessen Thank you for your hard work, you are appreciated.

Currently, ARU-Net is the most accurate & advanced layout analysis method, it even supersedes DMRZ. This is a request, that you consider implementing ARU-Net. https://github.com/TobiasGruening/ARU-Net https://arxiv.org/abs/1802.03345

fin0

The image below shows the results of the ICDAR 2017 Competition on Baseline Detection: fin

The ICDAR 2017 cBAD training and evaluation dataset used (simple & complex): https://scriptnet.iit.demokritos.gr/competitions/5/1/ https://zenodo.org/record/835441 https://arxiv.org/abs/1705.03311

mittagessen commented 6 years ago

No. ARU-net is an horrible mess of a system full of postprocessing algorithms for questionable gains in a metric that is completely independent of recognition accuracy. The basic underlying network is a U-Net which is just another type of FCN, the difference between the residual blocks implemented here and their DenseNet-like construction is probably neglectable.

As the readme of the baseline branch says I'm trying to use a relatively simple convolutional network to expand baseline masks to complete lines without extensive postprocessing. Baselines are easily separable into objects while fine grained pixel labellings (as produced by the main branch net) are not. Using baselines as seeds for expansion allows object detection on closely located and intermingled objects, something most other object detection networks fail at. Unfortunately, while the neural expansion part works just fine in my tests, all the baseline detector architectures I've tried converge on empty output images even when using weighted gradients.