Chap 7 - Investigate LayoutLM alternative to GPT-4V

LayoutLM is another language model (not as large as GPTs) that extends the BERT architecture to incorporate the layout information of the document, such as the bounding boxes, sizes, and positions of the text segments. The model can encode both the textual and visual features of the document and perform tasks such as document classification, form understanding, or entity extraction.