jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.57k stars 659 forks source link

snap_x_tolerance and snap_y_tolerance for extra flexibility in table extraction #475

Closed ratulbhadury closed 2 years ago

ratulbhadury commented 3 years ago

Hi Jeremy

Before I get into my issue, I just want to say (again), thanks for all your efforts on this package. I've found it much easier to use than many of the other similar python libraries.

I'm using v0.5.28. I'm faced with a scenario where I would like to specify different tolerances for the x and y directions when it comes to 'snapping' parallel lines into one.

I see the exact same request had been made some years ago, and it looks as it had been accepted and implemented back then into the v0.6 branch, but hasn't made its way into the master one.

Do you have plans to include this into the master branch? If not, can you suggest any workarounds for how I could get this to work?

Thanks and kind regards,

Ratul

samkit-jain commented 3 years ago

Yes, it is a useful feature request. Feel free to raise a PR if you have a solution in mind.

ratulbhadury commented 3 years ago

Hi @samkit-jain

I haven't investigated a solution yet, but I have seen that this feature has been previously requested, and is in fact part of the v0.6 alpha branch.

I have not yet looked deeply into the architecture of the v0.6 branch and its differences with the stable branch, but would it be possible to include this feature that is already in that branch, and merge it into the stable branch? Please refer to PR #51.

Many thanks

jsvine commented 2 years ago

Hi @ratulbhadury, and thanks again for this suggestion. Your requested feature is now available in v0.6.0! See https://github.com/jsvine/pdfplumber/pull/553 for details.