shabie / docformer

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)
MIT License
255 stars 40 forks source link

what this line represent? #10

Closed mayankpathaklumiq closed 3 years ago

mayankpathaklumiq commented 3 years ago

output = docformer(v_bar, t_bar, v_bar_s, t_bar_s) # shape (1, 512, 768)

shabie commented 3 years ago

This represents the output of the docformer encoder. Weights are randomly initialized so if you wanted to train it, you'd have to put a head on top of it.

uakarsh commented 3 years ago

@mayankpathaklumiq This implementation has been done, with the variables name, similar to that of the paper, so in order to fully grasp it, maybe you just need to keep the code and the paper, side by side, and everything would be clear, since this is what we did