Closed ynusinovich closed 2 years ago
Have a look here: https://github.com/uakarsh/docformer/blob/master/examples/DocFormer_for_MLM.ipynb
The error is because, the entity is not batched (i.e having a shape of (....), rather than (batch_size,....)
@uakarsh Thank you for your help! Does this mean that the Usage section of the README can't actually be used? I was trying to do a demo of it to my study group. I tried encoding['resized_scaled_img'] = encoding['resized_scaled_img'].unsqueeze(0)
to add a batch size of 1, but that didn't work either.
It can be used, we just need to pass an argument, add_batch_dim=True
, in dataset.create_features
function.
The thing, which you did also won't work, because there are more than just image features, i.e you need to unsqueeze the other features as well. I have updated the readme, hope it helps
Thank you so much, it runs now! Unsqueezing each feature also works for me, but add_batch_dim
is more straightforward. Are there any examples of followup steps (i.e., what the resulting tensor means in terms of the input image)? I can't find that in the README and examples.
Maybe, you can have a look at the notebook, which I shared previously. In that notebook, you can go through the DocFormerForMLM
class, and look at the forward method there. I would briefly describe it here:
All the shapes are mentioned as per the default configuration
self.embeddings
, are responsible for encoding the spatial features of the bounding boxes (size -> (512,768)self.resent
, is responsible for extracting the image feature (size -> (512, 768)self.lang_emb
, is responsible for the language feature extraction from the words of the bounding boxes (size -> (512,768)self.encoder
, calculates the attention and forward propagates it (size -> (512,768)And then, for downstream task, the linear layers are attached. Hope it helps.
Ok, understood, thank you very much for your help. I'll close the issue since the example runs!
I tried following the usage instructions you posted on a sample .jpg image of a receipt. Every time I run it, I get an error saying, "RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 384, 500] instead". How do I fix that?
Full code:
Full error: