jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.57k stars 659 forks source link

extracting table from multiple pages at time. #549

Closed Harshit-tech9 closed 2 years ago

Harshit-tech9 commented 2 years ago

Please describe, in as much detail as possible, your proposal and how it would improve your experience with pdfplumber.

Hey! Thanks for developing such good library. Currently I am working on project where I need to extract table from Bank account Statement. In this library we can extract table from one page at a time and we cannot iterate over multiple pages. Its an humble request to you folks to add this feature in your library.

samkit-jain commented 2 years ago

Hi @Harshit-tech9 Appreciate your interest in the library. Could you please add even more details? A minimum reproducible code example will be good as well.

I'm not exactly sure what you mean by support for multiple pages. You can extract tables from multiple pages by

for page in pdf.pages:
    page.extract_tables()

Is that what you meant?

Harshit-tech9 commented 2 years ago

Hey! yes that's what I mean. But when I save it to CSV using pandas only last page table gets converted.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Samkit Jain @.> Sent: Tuesday, November 30, 2021 8:05:57 PM To: jsvine/pdfplumber @.> Cc: Harshit Narendrabhai Panchal @.>; Mention @.> Subject: Re: [jsvine/pdfplumber] extracting table from multiple pages at time. (Issue #549)

Hi @Harshit-tech9https://github.com/Harshit-tech9 Appreciate your interest in the library. Could you please add even more details? A minimum reproducible code example will be good as well.

I'm not exactly sure what you mean by support for multiple pages. You can extract tables from multiple pages by

for page in pdf.pages: page.extract_tables()

Is that what you meant?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jsvine/pdfplumber/issues/549#issuecomment-982695161, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUIH3X4SGMF44YQCVLDMMT3UOTOM3ANCNFSM5JBX6L5Q. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

samkit-jain commented 2 years ago

Could you please share the code that you are using so that I can debug further?

Harshit-tech9 commented 2 years ago

Hey! Sure

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Samkit Jain @.> Sent: Tuesday, November 30, 2021 8:18:50 PM To: jsvine/pdfplumber @.> Cc: Harshit Narendrabhai Panchal @.>; Mention @.> Subject: Re: [jsvine/pdfplumber] extracting table from multiple pages at time. (Issue #549)

Could you please share the code that you are using so that I can debug further?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jsvine/pdfplumber/issues/549#issuecomment-982706727, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUIH3XYYBVRBDHZY435MPDLUOTP5FANCNFSM5JBX6L5Q. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jsvine commented 2 years ago

Hi @Harshit-tech9 and thanks @samkit-jain! Given that this seems to be a Python coding question rather than a pdfplumber-specific concern, I'm closing this issue. But feel free to continue troubleshooting here!