shabie / docformer

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)
MIT License
255 stars 40 forks source link

finetune the Docformer #42

Open jack-gits opened 2 years ago

jack-gits commented 2 years ago

whether I can fine-tune the model of Docformer? can you give some instruction please. thanks

uakarsh commented 2 years ago

You can find the examples here

jack-gits commented 2 years ago

how about this one https://github.com/shabie/docformer/blob/master/examples/DocFormer_for_MLM.ipynb. what's the difference with above link.

jack-gits commented 2 years ago

Docformer is based on Microsolft/LayoutLM, whether we can use it for commercial purpose?

uakarsh commented 2 years ago

how about this one https://github.com/shabie/docformer/blob/master/examples/DocFormer_for_MLM.ipynb. what's the difference with above link.

It is just a pre-training strategy, however you mentioned about fine-tuning, so I shared the same.

I think, that recently LayoutLM where allowed for commercial purpose. You can search for it online.

jack-gits commented 2 years ago

the liscense of layoutlmv3 has been changed back. curiously.

jack-gits commented 2 years ago

image

if use_ocr=False, I can't encode the label. there's only have words and boxes in the input para. how to deal with the labels?

uakarsh commented 2 years ago

the liscense of layoutlmv3 has been changed back. curiously.

Looks like they don't want to allow them to use, however you can see if layoutlm can be used, since we only use the initial weights of embeddings out of it.

uakarsh commented 2 years ago

image

if use_ocr=False, I can't encode the label. there's only have words and boxes in the input para. how to deal with the labels?

I had earlier shared the link for using DocFormer for token classification, you can visit it and use it for your your own purpose.