Closed madhur-dumane closed 6 years ago
Use mvn version 0.0.3
I have modified to use your own Stripper using pdfbox. Below sample code should retrieve the text based on the position.
PDFTextStripper stripper = new PDFTextStripper();
stripper.setSortByPosition(true);
pdfutil.useStripper(stripper);
pdfutil.getText(filePath);
I have not changed maven version but using above code it is working. Thank you so much for your help
Glad that it worked
I am using below code to get whole PDF text into strings and then compare of both string. String str = pdfutil.getText("C:\Users\"+System.getProperty("user.name")+"\Downloads"+"\"+prereport+".pdf"); String str1 = pdfutil.getText("C:\Users\"+System.getProperty("user.name")+"\Downloads"+"\"+postreport+".pdf"); System.out.println("Check the text from both PDFs : " + str.equalsIgnoreCase(str1));
sometimes retrival of text is not sequencial.Ex-
suppose from 1 PDF its retrieved text like --- $497.10 0.51 - Investment Cash from 2nd PDF its retrieving text like --- $497.10 -0.51 Investment Cash
in one string there is 0.51 - and in other string -0.51 so PDF comparison is failing. Please see above screenshot how it looks in actual both PDFs. Ideally it should retrieve sequentially and PDF Comparison should be successfully .Please help me to resolve this issue.