tabulapdf / tabula-java

Extract tables from PDF files
MIT License
1.85k stars 430 forks source link

Multiple Table in a PDF. Extract content from particular table without area. #225

Open HarrisLF opened 6 years ago

HarrisLF commented 6 years ago

Hi..

When i have multipletables in my PDF, how could i get the content of a specific table without using -a area option? Reason being, the area wouldnt be the same across all PDF's. i have attached the PDF tat i have used. And i need to get the content from the table which has headers, Date/Mode/Particulars/Deposits/Withdrawals/Balance.

Command tat i ran is.. im doing it as a JAVA project: Process runtime = Runtime.getRuntime().exec("cmd /c start \"Balance Sheet Analysis\" java -jar E://WS//Tabula//Trash//tabula-1.0.1-jar-with-dependencies.jar -f CSV -d -g C://RPA//manappuramOnBoarding//tabula-java-ws//SamplePDF.pdf -o ICICI-E-Statement.csv");

When i run the above command, i get only the headers from all the tables in csv file.

SamplePDF.pdf

rakshitcgupta commented 6 years ago

Using this library you will get all the tables in the pdf if you don't specify the area.