Open Danee-wawawa opened 3 years ago
The image is divided into non-overlapping patches. A patch may contain 0 or more character or even partial characters only. With position embedding, the transformer is able to figure out the parts of a whole. So, it has no impact. Not tried and something that can be experimented on: overlapping patches and smaller patches as done in DINO.
OK, thank you.
Hi, thank you for your work. This is a very meaningful job. Regarding algorithm design, I have a question. You convert an input image into patches firstly, if some characters are cut off or some patch contains multiple characters, will it have an impact? Looking forward to your reply.