ststeiger / PdfSharpCore

Port of the PdfSharp library to .NET Core - largely removed GDI+ (only missing GetFontData - which can be replaced with freetype2)
Other
1.08k stars 237 forks source link

PdfSharpCore.Pdf.IO.PdfReaderException: Unexpected character '0x0069' in PDF stream. #84

Open fionik opened 4 years ago

fionik commented 4 years ago

I am attempting to open a PDF file using PDFSharp and getting the following exception:

PdfSharpCore.Pdf.IO.PdfReaderException: 'Unexpected character '0x0069' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file.'

JLLAP_14022018101618_EMAIL_14022018101618_99_00005_001.pdf

Exception stack:

Unhandled exception. PdfSharpCore.Pdf.IO.PdfReaderException: Unexpected character '0x0069' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file. at PdfSharpCore.Internal.ParserDiagnostics.ThrowParserException(String message) at PdfSharpCore.Internal.ParserDiagnostics.HandleUnexpectedCharacter(Char ch) at PdfSharpCore.Pdf.IO.Lexer.ScanLiteralString() at PdfSharpCore.Pdf.IO.Lexer.ScanNextToken() at PdfSharpCore.Pdf.IO.Parser.ParseObject(Symbol stop) at PdfSharpCore.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences) at PdfSharpCore.Pdf.IO.Parser.ReadObject(PdfObject pdfObject, PdfObjectID objectID, Boolean includeReferences, Boolean fromObjecStream) at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider) at PdfSharpCore.Pdf.IO.PdfReader.Open(String path, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider provider) at PdfSharpCore.Pdf.IO.PdfReader.Open(String path) at PDFSharpTest.Program.Main(String[] args) in C:\Users\Andrew\Documents\Visual Studio 2019\Projects\PDFSharpTest\PDFSharpTest\Program.cs:line 11

This can be reproduced with a trivial program

using System;
using PdfSharpCore.Pdf;
using PdfSharpCore.Pdf.IO;

namespace PDFSharpTest
{
    class Program
    {
        static void Main(string[] args)
        {
            PdfDocument document = PdfReader.Open(args[0]);
            Console.WriteLine($"Page count: {document.PageCount}.");
        }
    }
}

The library I am using is PdfSharpCore 1.1.26 downloaded from NuGet.

The file appears to be okay as I am able to open it without any issues by the Acrobat Reader. JLLAP_14022018101618_EMAIL_14022018101618_99_00005_001 pdf opened in Acrobat Reader

fionik commented 4 years ago

It appears to be the source of the problem is this literal.

\\uslil620.am.jllnet.com\invoices\2018\AUD\201801\A2210_AU003-0105254__5106022609_AU003.pdf

I understand that all these slashes should have been escaped, but they haven't been escaped which is technically standard violation. On other hand this document says:

If the character following the backslash is not one of those shown in the table, the backslash is ignored.

So to me it looks like the library should have ignored the illegal backslash rather than throwing an exception.

chrisnurse commented 3 years ago

I'm sure you realise that the document above contains sensitive information that you don't want disclosed on the internet?

fionik commented 3 years ago

I'm sure you realise that the document above contains sensitive information that you don't want disclosed on the internet?

I would be happy if there was a secure way to communicate such documents to the developers without disclosing them on the Internet.

ranausman008 commented 2 years ago

Guys is there any patch or fix related to this issue, as I am currently facing exact same issue. It would be a great help if some information is shared.

jungwonbae commented 9 months ago

Save the pdf file with a different name and merge it.