Open tsiank opened 3 years ago
Does this only occur with a certain pdf? If so, would you be so kind to upload it?
Does this only occur with a certain pdf? If so, would you be so kind to upload it?
I have several pdfs to convert with "for" loop, I find it seems the bug appears when the total page numbers of all pdfs are about more than 300. My code snipes are as below:
`public static void ConvertPdfToJpg(string pdffile, string imageSource, Dictionary<string,string> dict)
{
string pdffileDir = pdffile.Replace(".pdf", "");
if(!Directory.Exists(pdffileDir))
{
Directory.CreateDirectory(pdffileDir);
}
string filename = "";
string pattern = @"Student Number:.*?[0-9]{15,17}";
string pattern2 = @"Student Number:.*?([0-9]{15,17})";
string replacement = "$1";
Regex rgx = new Regex(pattern);
Regex rgx2 = new Regex(pattern2);
PdfDocument pdfDoc = new PdfDocument(new PdfReader(pdffile));
int numberOfPages = pdfDoc.GetNumberOfPages();
for (int i = 1; i <= numberOfPages; i++)
{
PdfPage page = pdfDoc.GetPage(i);
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentPageText = PdfTextExtractor.GetTextFromPage(page, strategy);
MatchCollection matchret = rgx.Matches(currentPageText);
string examid = rgx2.Replace(matchret[0].Value, replacement);
//MessageBox.Show(examid);
var bitmap1 = page.ConvertPageToBitmap();
if (dict.ContainsKey(examid))
{
filename = dict[examid];
bitmap1.Save($@"{pdffileDir}\{filename}.jpg", ImageFormat.Jpeg);
}
else
{
bitmap1.Save($@"{pdffileDir}\!!!{examid}.jpg", ImageFormat.Jpeg);
}
bitmap1.Dispose();
}
}`
because of my private pdf, May I have your email to send?
I have found another bug, some pdf contents are lost in the converted jpg
I have found another bug, some pdf contents are lost in the converted jpg
Please create a speparate issue and provide a example pdf.
I downloaded your source code and tried to debug , I found if the key type of chunkDictionairy is setted to double , then this bug could be fixed, but I don't sure whether it works if the pdf page numbers add more.
I meet a issue in code
var bitmap1 = page.ConvertPageToBitmap();
when I use this pdfimage converter to convert pdf to jpg, details as below: System.ArgumentException HResult=0x80070057 Message=An item with the same key has already been added. Key: [32768, itext.pdfimage.Models.TextChunk] Source=System.Collections StackTrace: at System.Collections.Generic.TreeSet1.AddIfNotPresent(T item) at System.Collections.Generic.SortedDictionary
2.Add(TKey key, TValue value) at iText.Kernel.Pdf.Canvas.Parser.Listener.TextListener.EventOccurred(IEventData data, EventType type) at iText.Kernel.Pdf.Canvas.Parser.Listener.FilteredEventListener.EventOccurred(IEventData data, EventType type) at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.EventOccurred(IEventData data, EventType type) at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.InvokeOperator(PdfLiteral operator, IList1 operands) at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.ProcessContent(Byte[] contentBytes, PdfResources resources) at itext.pdfimage.PdfToImageConverter.ConvertToBitmap(PdfPage pdfPage) at ATA.DeptOfProject4.EtsTest.PdfOp.ConvertPdfToJpg(String pdffile, String imageSource, Dictionary
2 dict) in D:\pdfopt.cs:line 57