microsoft / table-transformer

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
MIT License
2.01k stars 231 forks source link

Can table tranformer be used to detect multiple tables in an image? #157

Open theshahshow opened 7 months ago

theshahshow commented 7 months ago
Screenshot 2023-12-04 at 1 08 29 AM

For this image, when i try the table transformer, i get only 1 prediction. Like if i crop the image and then run on individual crops, the results are as expected. But is it possible to pass the whole image and get multiple predictions?

Code used:

model = TableTransformerForObjectDetection.from_pretrained("microsoft/table-transformer-detection")
feature_extractor = DetrFeatureExtractor()
encoding = feature_extractor(image, return_tensors="pt")

with torch.no_grad():
  outputs = model(**encoding)

height, width = image.shape[:2] # HWC
results = feature_extractor.post_process_object_detection(outputs, threshold=0.4, target_sizes=[(height, width)])

Also the results when uploading image on : https://huggingface.co/microsoft/table-transformer-detection, and results when running on local seem to be different. Does anyone know why?

NielsRogge commented 7 months ago

Sure you can, see #158. You can take all detections that are not of the "no object" class, as shown in the notebook