rossumai / elis-capture

React Native app for uploading documents to Elis
3 stars 2 forks source link

Show bounding box of the document #10

Open EskelCz opened 5 years ago

EskelCz commented 5 years ago

To produce a clean image of the document, whatever the aspect ratio, it would be nice to have a bounding box finding algorithm. It could overlay the found boundaries on top of the camera view in real time and black out the surroundings. Then we could apply a perspective transformation to produce the best possible scan.

Here is an example for a an established document scanner app (Tiny Scanner) image of a bounding box on top of a document photo

It will probably require native code though. I'll find out if there is any open source implementation available.

EskelCz commented 5 years ago

It can be done with image recognition net, here is an example of OpenCV implementation to React Native: https://github.com/brainhubeu/react-native-opencv-tutorial

zepod commented 5 years ago

Great idea! Cutting the edges, however, is not a pain currently. RIR layer does this automatically.

On the other hand, what is very painful atm (mainly for receipts) is the parallax effect and sharpness.

What I believe to be the way to go, is to... 1) implement something similar to what Microsoft Lens or Evernote has. Meaning a feature that tilts the image so it looks like the photo was taken from 90deg angle even though it wasn't 2) some kind of validation mechanism, that alert you, when the image isn't sharp enough. This would happen right after you take the photo.

Any ideas on how to implement this? And how difficult it could be?

zepod commented 5 years ago

I've looked it up a bit.. It's called image deskewing. It is composed of 3 steps, all machine learning-ish.

1) detect document on the image (get positions of it's edges) 2) crop the image so it only displays the document 3) deskew it so it's 90deg

It's quite challanging for dev with no ML background.. Could be a good opportunity to team up with one of the research guys. I'll ping them..