ibm-aur-nlp / PubLayNet

Other
900 stars 165 forks source link

How do i use pretrained model for prediction? #13

Open monuminu opened 4 years ago

monuminu commented 4 years ago

Hi All,

Thanks a lot of this awesome Dataset and pretrained weights . I wanted to know how can i use this for prediction of bounding box given a page image ?

zhxgj commented 4 years ago

Hi @monuminu The model was trained with the Detectron frame work. So you can load it in Detectron and use its inference examples. I will also provide a Jupyter notebook for inference soon.

Lambert-Shirzad commented 4 years ago

Hi, Once again, thanks for releasing your great work. I am not sure whether we should get Detectron or Detectron2? Detectron is in Caffe2 and installing Caffe at this point is just a pain. Detectron2 is a rewrite of Detecton in PyTorch and seems to enjoy a better support. Looking forward to the tutorial!

zhxgj commented 4 years ago

Hi @Lambert-Shirzad Thank you. I trained the models on Detectron, which is powered by Caffe2. But I found the installation was quite easy. I just followed the instructions in this link. We also plan to retrain the models in the Detectron2 framework.

monuminu commented 4 years ago

Agree with @Lambert-Shirzad . It will be awesome if you can train on Detectron2 . Also waiting for your jupyter notebook. That will greatly help me .

hpanwar08 commented 4 years ago

If anyone is interested, I have trained it on Detectron2. You can find training config and trained models (resnet101, resnext101) in this repo https://github.com/hpanwar08/detectron2

Note: Models are trained on ~60% of the original dataset for ~1.5 epochs. But these models works good for fine-tuning domain specific dataset.

nodechef commented 4 years ago

@hpanwar08 I am trying your trained models for prediction but I get this error: Config '\configs\DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml' has no VERSION. Assuming it to be compatible with latest v2.

hpanwar08 commented 4 years ago

@nodechef This is just a warning. You should still be able to get the predictions. I have updated the models. Please download the smaller trimmed model "model_final_trimmed.pth" from dropbox. It is smaller in size.

Use the below command for prediction

python demo/demo.py --config-file configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml --input "<path to image.jpg>" --output <path to save the predicted image> --confidence-threshold 0.5 --opts MODEL.WEIGHTS <path to model_final_trimmed.pth> MODEL.DEVICE cpu
nodechef commented 4 years ago

@hpanwar08 Yeah it worked, later realized that it was just the warning. However, I have a question. How do we get predicted classes ? like title, paragraph, figure ?

nodechef commented 4 years ago

I guess we need to add this to demo.py in order to get the predicted class, Right ? Correct me if I am wrong.

from detectron2.data import MetadataCatalog
MetadataCatalog.get("dataset_name").thing_classes = ["title", "text","figure", "table","list"]

The name of the dataset would be the one you used for training ?

hpanwar08 commented 4 years ago

@nodechef If you want just the predictions and not visualization then use detectron2.engine.defaults.DefaultPredictor() to get the classes. You will get class indexes, then you need to convert class indexes to actual class names. Refer line 35 and 48 in predictor.py

Or you can try this

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')
pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]

"instances" contains the bbox, class probabilities and class indexes.

nodechef commented 4 years ago

@hpanwar08 I am looking for class labels instead of percentage for each bbox (Along with the visualization.)

hpanwar08 commented 4 years ago

@nodechef If that is the case, then what you said should work. dataset_name will be "dla_val"

nodechef commented 4 years ago
MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).thing_classes = ['text', 'title', 'list', 'table', 'figure']

v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]) , scale=1.2)
v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
cv2_imshow(v.get_image()[:, :, ::-1])

This worked.

hpanwar08 commented 4 years ago

@deepseek Done!

elnazsn1988 commented 4 years ago

Hi, is there an internal feature which lets each classed be saved as a seperate segment, or image? I am trying to identify tables, seperate and then run through a tabular data analyzer and ocr - so far am able to get the image predictions with your code, but not the actual annotations/segmented fields for further analysis/ocr.

elnazsn1988 commented 4 years ago

@nodechef If you want just the predictions and not visualization then use detectron2.engine.defaults.DefaultPredictor() to get the classes. You will get class indexes, then you need to convert class indexes to actual class names. Refer line 35 and 48 in predictor.py

Or you can try this

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')
pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]

"instances" contains the bbox, class probabilities and class indexes.

Sorry, can you expand on where I woud add/change these lines you added - do I run as an external .py code and reference input/outputs, or do I change the predict code as mentioned above?

elnazsn1988 commented 4 years ago
MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).thing_classes = ['text', 'title', 'list', 'table', 'figure']

v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]) , scale=1.2)
v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
cv2_imshow(v.get_image()[:, :, ::-1])

This worked.

@nodechef does this give you both the classes, and the visualization? could you elaborate on whether you changed these lines in the predict.py file or ran an external .py file?

elnazsn1988 commented 4 years ago

MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).thing_classes = ['text', 'title', 'list', 'table', 'figure']

v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]) , scale=1.2) v = v.draw_instance_predictions(outputs["instances"].to("cpu")) cv2_imshow(v.get_image()[:, :, ::-1])

@nodechef when adding the script you added to the bottom of demo.py, I get the following error:

Traceback (most recent call last):
  File "C:\projects\pytorch\detectron2\demo\demo.py", line 155, in <module>
    v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]) , scale=1.2)
NameError: name 'Visualizer' is not defined

when adding in to the top of the demo.py file, I get the following:

Traceback (most recent call last):
  File "C:\projects\pytorch\detectron2\demo\demo.py", line 22, in <module>
    MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).thing_classes = ['text', 'title', 'list', 'table', 'figure']
NameError: name 'cfg' is not defined

is there somewhere specific I should be adding your edit?

elnazsn1988 commented 4 years ago

@nodechef If you want just the predictions and not visualization then use detectron2.engine.defaults.DefaultPredictor() to get the classes. You will get class indexes, then you need to convert class indexes to actual class names. Refer line 35 and 48 in predictor.py

Or you can try this

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')
pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]

"instances" contains the bbox, class probabilities and class indexes.

@hpanwar08 thanks the above throws up no errors, how can I save the instances to a location on my system? am running through shell.

hpanwar08 commented 4 years ago

@nodechef If you want just the predictions and not visualization then use detectron2.engine.defaults.DefaultPredictor() to get the classes. You will get class indexes, then you need to convert class indexes to actual class names. Refer line 35 and 48 in predictor.py Or you can try this

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')
pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]

"instances" contains the bbox, class probabilities and class indexes.

@hpanwar08 thanks the above throws up no errors, how can I save the instances to a location on my system? am running through shell.

@elnazsn1988 In addition to the above code you can extract the bounding boxes and then crop the image based on the bounding box and save.

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')

pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
    boxes = boxes.tensor.numpy()
else:
    boxes = np.asarray(boxes)

for label, bbox in zip(labels, boxes):
    if label == "table":
        cropped_img = img.crop(bbox)
        croppped_img.save(f"{label}_{bbox}.png")

Hope this helps.

pollyMath commented 4 years ago

@nodechef If you want just the predictions and not visualization then use detectron2.engine.defaults.DefaultPredictor() to get the classes. You will get class indexes, then you need to convert class indexes to actual class names. Refer line 35 and 48 in predictor.py Or you can try this

classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')
pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]

"instances" contains the bbox, class probabilities and class indexes.

Sorry, can you expand on where I woud add/change these lines you added - do I run as an external .py code and reference input/outputs, or do I change the predict code as mentioned above?

this may already be resolved but would leave it for future reference, based on node chef's code, what I did is adding the following lines in demo.py after demo.run_on_image

from detectron2.data import MetadataCatalog

            MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).thing_classes = ["text",
                                                                 "title",
                                                                 "list",
                                                                 "table",
                                                                 "figure"]
            v = Visualizer(img[:, :, ::-1],
                           MetadataCatalog.get(cfg.DATASETS.TRAIN[0]),
                           scale=1.2)
            v = v.draw_instance_predictions(predictions["instances"].to("cpu"))

and then imwrite(out_filename, v.get_image()[:, :, ::-1]) where u have output filename Note you may have to reorder the labels in the classes array

ChungNPH commented 4 years ago

Did anyone try to find tune with others categories. For example, i have others categories as authors, introduction. So my categories are [title, text, figure, table, list, authors, introduction]. I know that IBM don't have plan with additional categories, but how can i try to do this?

zhxgj commented 4 years ago

Did anyone try to find tune with others categories. For example, i have others categories as authors, introduction. So my categories are [title, text, figure, table, list, authors, introduction]. I know that IBM don't have plan with additional categories, but how can i try to do this?

Hi @ChungNPH do you have annotations of the additional categories?

ChungNPH commented 4 years ago

Did anyone try to find tune with others categories. For example, i have others categories as authors, introduction. So my categories are [title, text, figure, table, list, authors, introduction]. I know that IBM don't have plan with additional categories, but how can i try to do this?

Hi @ChungNPH do you have annotations of the additional categories?

I create a custom dataset my self, it have some difference with publaynet format. As: { image: { file name: 'filename', height: 'height', width: 'width', id: 'id', annotations: { obj 1: [], obj 2: [], } } }

And i tried to train only publaynet with detectron2 but may detectron2 dont understand publaynet format as { 'image': [...], 'annotations' : [...], 'categories' : [...] }. Should I change json file format to train? Can you so me an overview to train with publaynet? Thank you so much!

ChungNPH commented 4 years ago

https://github.com/facebookresearch/detectron2/blob/master/docs/tutorials/datasets.md

I see that detectron2 need to re-format dataset to their format before train. Is it right?

jmandivarapu1 commented 4 years ago

Hi @monuminu The model was trained with the Detectron frame work. So you can load it in Detectron and use its inference examples. I will also provide a Jupyter notebook for inference soon.

Hey can anybody help me in correctly loading pretrained model and test it on documents,

I am trying to use the pretrained weights on the testset Images. PMC3654277_00006

Case 1 : Using config configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml

code: python3.7 demo/demo.py --config-file configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml --input examples/PMC3654277_00006.jpg --output out/ --confidence-threshold 0.25 --opts MODEL.DEVICE cpu

But it's giving me pretty bad results. if I increase confidence-threshold> 0.25 then there is no detection

Case 2 : Using config configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml

code python3.7 demo/demo.py --config-file pre-trained-models/Mask-RCNN/e2e_mask_rcnn_X-101-64x4d-FPN_1x.yaml --input examples/PMC3576793_00004.jpg --output out/ --confidence-threshold 0.2 --opts TYPE Mask-RCNN MODEL.DEVICE cpu

throwing the below error

'pre-trained-models/Mask-RCNN/e2e_mask_rcnn_X-101-64x4d-FPN_1x.yaml' has no VERSION. Assuming it to be compatible with latest v2.
Traceback (most recent call last):
  File "demo/demo.py", line 76, in <module>
    cfg = setup_cfg(args)
  File "demo/demo.py", line 28, in setup_cfg
    cfg.merge_from_file(args.config_file)
  File "/opt/anaconda3/lib/python3.7/site-packages/detectron2/config/config.py", line 49, in merge_from_file
    self.merge_from_other_cfg(loaded_cfg)
  File "/opt/anaconda3/lib/python3.7/site-packages/fvcore/common/config.py", line 120, in merge_from_other_cfg
    return super().merge_from_other_cfg(cfg_other)
  File "/opt/anaconda3/lib/python3.7/site-packages/yacs/config.py", line 217, in merge_from_other_cfg
    _merge_a_into_b(cfg_other, self, self, [])
  File "/opt/anaconda3/lib/python3.7/site-packages/yacs/config.py", line 464, in _merge_a_into_b
    _merge_a_into_b(v, b[k], root, key_list + [k])
  File "/opt/anaconda3/lib/python3.7/site-packages/yacs/config.py", line 477, in _merge_a_into_b
    raise KeyError("Non-existent config key: {}".format(full_key))
KeyError: 'Non-existent config key: MODEL.TYPE'
hpanwar08 commented 4 years ago

Where did you download the trained model from?

hpanwar08 commented 4 years ago

https://github.com/facebookresearch/detectron2/blob/master/docs/tutorials/datasets.md

I see that detectron2 need to re-format dataset to their format before train. Is it right?

You probably have solved it by now, but just for info, if your data is in coco format then it's easy to train detectron2. And yes detectron2 convert the data into it's own data structure.

https://github.com/hpanwar08/detectron2

ShabariGadewar commented 3 years ago

https://github.com/hpanwar08/detectron2

Hi, I need a little help, I'm new to this. I just want to test the pre-trained models of PubLayNet dataset on the test data given on PubLayNet. I have downloaded the pkl file and un-pickled it. But I don't know how to use it on test data. Can you help me out? Thank you

hpanwar08 commented 3 years ago

You could try this https://github.com/hpanwar08/document-layout-analysis-app

ankur-sentieo commented 3 years ago

Hey guys @ShabariGadewar @zhxgj , do you have any jupyter notebook now for the publaynet pre-trained model loading and testing? I am also looking to test the pre-trained models of PubLayNet dataset on the test data given on PubLayNet but i also get the same error which @jmandivarapu1 was getting in his comment above. Also @hpanwar08, I was able to run your trimmed models but can you please guide on how can we further train your pre-trained models (trimmed ones) that you have included in your repo?

'pre-trained-models/Mask-RCNN/e2e_mask_rcnn_X-101-64x4d-FPN_1x.yaml' has no VERSION. Assuming it to be compatible with latest v2. Traceback (most recent call last): File "demo/demo.py", line 76, in cfg = setup_cfg(args) File "demo/demo.py", line 28, in setup_cfg cfg.merge_from_file(args.config_file) File "/opt/anaconda3/lib/python3.7/site-packages/detectron2/config/config.py", line 49, in merge_from_file self.merge_from_other_cfg(loaded_cfg) File "/opt/anaconda3/lib/python3.7/site-packages/fvcore/common/config.py", line 120, in merge_from_other_cfg return super().merge_from_other_cfg(cfg_other) File "/opt/anaconda3/lib/python3.7/site-packages/yacs/config.py", line 217, in merge_from_other_cfg _merge_a_into_b(cfg_other, self, self, []) File "/opt/anaconda3/lib/python3.7/site-packages/yacs/config.py", line 464, in _merge_a_into_b _merge_a_into_b(v, b[k], root, key_list + [k]) File "/opt/anaconda3/lib/python3.7/site-packages/yacs/config.py", line 477, in _merge_a_into_b raise KeyError("Non-existent config key: {}".format(full_key)) KeyError: 'Non-existent config key: MODEL.TYPE'

under-score commented 2 years ago

in support of @ankur-sentieo for a juypter/colab/binder notebook or at least a working docker container

opyate commented 1 year ago

If anyone is interested, I have trained it on Detectron2. You can find training config and trained models (resnet101, resnext101) in this repo https://github.com/hpanwar08/detectron2

Note: Models are trained on ~60% of the original dataset for ~1.5 epochs. But these models works good for fine-tuning domain specific dataset.

Thanks for that!

Note that the layout-parser team also retrained with detectron2. See: https://github.com/Layout-Parser/layout-parser/blob/main/src/layoutparser/models/detectron2/catalog.py#L25

And https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html#model-catalog

s-jays commented 4 months ago

Hi, just wondering how I can save the annotation output from the model into the same format as Publaynet's dataset?