xavctn / img2table

img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing
MIT License
577 stars 76 forks source link

Height variable in pdf.py having value of 0 #234

Open UnayShah opened 6 days ago

UnayShah commented 6 days ago

I faced an issue when using a PDF where self.height took a value of 0. https://github.com/xavctn/img2table/blob/f93294078e85b2dcc6efcf64c3d6e7e7e491f412/src/img2table/ocr/pdf.py#L84-L90

Suggesting a check or change in this function in the form of:

    def direction(self):
        if len(self.chars) >= 3 and self.width > 0 and self.height > 0:
            if self.width / self.height >= 2:
                return "horizontal"
            elif self.height / self.width >= 2:
                return "vertical"
        return "unknown"

Or a try catch block that returns an appropriate response.