microsoft / OCR-Form-Tools

A set of tools to use in Microsoft Azure Form Recognizer and OCR services.
MIT License
511 stars 174 forks source link

Issue with the correct labeling of selectionMarks - v2.1 - preview.3 #916

Open at-philipp-heinrich opened 3 years ago

at-philipp-heinrich commented 3 years ago

Description: We have several identical files, with the same layout and selectionMarks in the same place, filled in differently. In the tag-editor, the selection marks are labeled on some pages and on some they are not. Even if I draw a region by myself, it only returns a NULL. It also differs in whether it is handwritten or not.

In the analyzed result it is the same. In some files they are labeled and in others they are not. So there's no real clue as to why it's reacting that way. Because the files are identical in layout.

Questions:

Edit: The problem also occurs with other selectionMarks that are not so close to each other, as seen below in the 2nd image. So in this case it can't be due to the layout.

Additional context Fott_examples

Fott_examples_2

RJWerning commented 3 years ago

I've seen similar issues with 'radio' type selectionMark, checkbox style seem to always work. It's actually an issue with the form recognizer layout analyze API, not FOTT. You can see this by using the "layout analyze" option in fott-preview, you'll see that the selection marks you have issues with will not be found their either.

I posted an article on StackOverflow about this & heard back from Microsoft on it. I was able to send them some images that I was having issues with, they ran it through the new version of the detection API and said that the issues I was facing was fixed in the next preview release, scheduled for ~5/21. https://stackoverflow.com/questions/67183842/training-custom-form-selectionmark-bounding-box-identification-issues

Couple suggestions that may help:

-Rich W