Open Laxmi530 opened 2 years ago
You may follow the LayoutXLM paper and look at this https://github.com/microsoft/unilm/tree/master/layoutxlm#fine-tuning-for-relation-extraction
@wolfshow Thank you so much for replay. I need some implementation example or the procedure how to do. I gone through the Huggingface site and gone through @NielsRogge tutorial, I saw most of the people doing the fine tuning only. I follow some process to understand the document but getting error. You can see that below. Can you please help me.
@wolfshow can you please share some example which I can follow that or need the guidance how to use LayoutXLM.
@Laxmi530 hi, have you made it? I'm trying to make the same thing and I'm a bit lost, can you share the steps you took to get where you are/ give any tips?
@nurielw05 Hai, I tried these code but getting some error. Also I saw your code you did quite well but that is not what exactly the key value pair extraction. You can see the code below if you able to fix the error let me know.
feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
tokenizer = AutoTokenizer.from_pretrained(path, pad_token='')
model = LayoutLMv2ForRelationExtraction.from_pretrained(path)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
image_file = 'image4.png'
image = Image.open(image_file).convert('RGB')
image
width, height = image.size
w_scale = 1000/width
h_scale = 1000/height
ocr_data = pytesseract.image_to_data(image, output_type='data.frame')
ocr_data = ocr_data.dropna()
ocr_data.assign(left_scaled = ocr_data.leftw_scale, width_scaled = ocr_data.widthw_scale,
top_scaled = ocr_data.toph_scale, height_scaled = ocr_data.heighth_scale,
right_scaled = lambda x: x.left_scaled + x.width_scaled,
bottom_scaled = lambda x: x.top_scaled + x.height_scaled)
float_cols = ocr_data.select_dtypes('float').columns
ocr_data[float_cols] = ocr_data[float_cols].round(0).astype(int)
ocr_data = ocr_data.replace(r'^\s*$', np.nan, regex=True)
ocr_data = ocr_data.dropna().reset_index(drop=True)
ocr_datawords = list(ocr_data.text)
coordinates = ocr_data[['left', 'top', 'width', 'height']]
actual_boxes = []
for idx, row in coordinates.iterrows():
x, y, w, h = tuple(row) # the row comes in (left, top, width, height) format
actual_box = [x, y, x+w, y+h] # we turn it into (left, top, left+widght, top+height) to get the actual box
actual_boxes.append(actual_box)
def normalize_box(box, width, height):
return [
int(1000 * (box[0] / width)),
int(1000 * (box[1] / height)),
int(1000 * (box[2] / width)),
int(1000 * (box[3] / height)),
]
boxes = []
for box in actual_boxes:
boxes.append(normalize_box(box, width, height))
encoding = tokenizer.encode_plus(ocr_datawords, boxes=boxes, return_tensors='pt')
input_id = encoding['input_ids']
attention_masks = encoding['attention_mask']
boxes = encoding['bbox']
encoding.keys()
outputs = model(**encoding)
This is the error
AttributeError Traceback (most recent call last)
c:\Users\name\Parallel\Trans_LayoutXLM.ipynb Cell 9 in <cell line: 1>()
----> [1](vscode-notebook-cell:/c%3A/Users/name/Parallel%20Project/Trans_LayoutXLM.ipynb#ch0000009?line=0) outputs = model(**encoding)
File c:\Users\name\.conda\envs\layoutlmft\lib\site-packages\torch\nn\modules\module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don't have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []
File c:\Users\name\.conda\envs\layoutlmft\lib\site-packages\transformers\models\layoutlmv2\modeling_layoutlmv2.py:1585, in LayoutLMv2ForRelationExtraction.forward(self, input_ids, bbox, labels, image, attention_mask, token_type_ids, position_ids, head_mask, entities, relations)
1522 @add_start_docstrings_to_model_forward(LAYOUTLMV2_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
1523 @replace_return_docstrings(output_type=RegionExtractionOutput, config_class=_CONFIG_FOR_DOC)
1524 def forward(
(...)
1535 relations=None,
1536 ):
1537 r"""
1538 entities (list of dicts of shape `(batch_size,)` where each dict contains:
1539 {
(...)
1582 >>> relations = *****
1583 ```"""
-> 1585 outputs = self.layoutlmv2(
1586 input_ids=input_ids,
1587 bbox=bbox,
1588 image=image,
1589 attention_mask=attention_mask,
1590 token_type_ids=token_type_ids,
1591 position_ids=position_ids,
1592 head_mask=head_mask,
...
--> 590 images_input = ((images if torch.is_tensor(images) else images.tensor) - self.pixel_mean) / self.pixel_std
591 features = self.backbone(images_input)
592 features = features[self.out_feature_key]
AttributeError: 'NoneType' object has no attribute 'tensor'
Describe Model I am using (LayoutLM): Can someone please guide me how can I get the key value pair from a scanned invoice using LayoutLM.