mattosaurus / PgpCore

.NET Core class library for using PGP
MIT License
244 stars 98 forks source link

Verified file has extra characters. #262

Closed LucTGitHub closed 11 months ago

LucTGitHub commented 11 months ago

Hello, I can sign an XML file, and then verify it. The veried file does not match the original. There are a couple extra characters at beginning.

Example

You can use XML in this zip file for testing. Upzip the file and sign the XML. When you verify, the verfied file has the extra characters.  input.zip

mattosaurus commented 11 months ago

Looks like this is the UTF8 encoding BOM for the file.

I can see this being added for your file but not in other XML docs I'm testing with. Possibly it's getting added twice.

mattosaurus commented 11 months ago
image

Your file is encoded as UTF-8-BOM, and includes those 3 characters as the first 3 bytes. When it's opened in Notepad++ or read in using something like File.ReadAllLines then these bytes are ignored and not included in the viewed string.

If you read it in byte by byte as a stream then they are included.

using (Stream inputFileStream = new FileInfo(@"\input\input.xml").OpenRead())
{
    // Read the stream character by character and create a string
    StringBuilder sb = new StringBuilder();
    while (inputFileStream.Position < inputFileStream.Length)
    {
        sb.Append((char)inputFileStream.ReadByte());
    }

    // String includes UTF8 BOM characters
    string input = sb.ToString();
}

Because the various methods (Sign and Verify in this case) all work on the raw stream they read in the initial 3 BOM bytes and treat them as any other content character would be treated. I think this is expected behaviour, however I agree it's not ideal.

I'm not really sure how best to deal with this, though my suggestion would be that the files should be generated with the UTF-8-BOM or it should be stripped out before getting to PGPCore. I don't really like the idea of selectively ignoring these characters once they've been passed to PGPCore as they could have been done so as part of a legitimate message but if you've got any suggestions I'm happy to discuss.

LucTGitHub commented 11 months ago

Thanks for looking into this! With this same input.xml file, I don't see this issue when I "Encrypt and Sign" and then "Decrypt and Verify". Do you know why having encryption and decryption seems to help in this situation?

mattosaurus commented 11 months ago

I think it's because we use StreamWriter in the Verify method, and output the message bytes to it one by one as we read them for signature verification whereas for the Decrypt method we pipe the stream directly to the output stream.

https://github.com/mattosaurus/PgpCore/blob/bb117c44554d74941b73defe35419151be02662d/PgpCore/PGP.VerifyAsync.cs#L107-L112

There might be a better way to do this, I'll take a look into it later this week.

LucTGitHub commented 11 months ago

I also found that Verify seems to always output UTF-8-BOM files. In this test, I create a text file with Notepad and put just a sentense in there. The file is UTF-8 encoded. Then I sign the file and verify it. The verified file is encoded in UTF-8-BOM. I wonder why Verify takes a signed UTF-8 file and outputs a UTF-8-BOM file?

LucTGitHub commented 11 months ago

Just adding another test case here Sign and Verify also has issue with a docx. input.docx

No run-time errors, but the verified file cannot be opened with Microsoft Words. This is not an issue with Encrypt-Sign and Decrypt-Verify. Thanks!

mattosaurus commented 11 months ago

This was an issue with converting the decrypted bytes to chars when writing them to the output stream so the BOM got written as literal characters. I've switched to using a BinaryWriter from StreamWriter and writing bytes out directly now. This should be fixed in v6.3.

LucTGitHub commented 11 months ago

I'm still having the same issue with version v6.3. But if I clone the project and reference it in Visual Studio, then it works. Thanks!

mattosaurus commented 11 months ago

Hmm, I think I must have published an older build of the package.

I've pushed v6.3.1 and that seems to work as expected now.

LucTGitHub commented 11 months ago

Yes, that worked. Thank you so much for your help!!!