pdf-association / pdf-issues

Industry-based resolutions for issues and errata reported against any PDF-related specification
https://pdf-issues.pdfa.org/
66 stars 2 forks source link

Length field in stream dictionaries when encryption is used #469

Closed seehuhn closed 1 week ago

seehuhn commented 2 months ago

The PDF-2.0 spec contains the following information about the Length field in a stream dictionary:

I believe that "number of bytes to be encrypted" must be the length of the cleartext. In contract, the "number of bytes to be decrypted" could be read as the length ciphertext, and also table 5 seems to require Length to be the length of the ciphertext.

What is the correct value for Length when encryption is used? It would be nice if the spec would be more explicit about this.

petervwyatt commented 2 months ago

No - the Length entry is always the length of the data (in bytes) between stream and endstream keywords, excepting possibly for an extra EOL sequence. It applies to all the filters that are specified (since you can chain/cascade them) and encryption is no different. See Table 5.

PDF 2.0 introduced a new optional entry, DL, to represent the output length of the decoded/decrypted (defiltered) data.

However, a commonly seen extant data error is that many PDF producers get this wrong...

seehuhn commented 2 months ago

Yes, I agree that this is what is meant. But doesn't the claim that Length is the "The number of bytes to be encrypted" contradict this? Maybe that statement is even he reason that some PDF producers get this wrong?

Suggestion: Why not replace the quoted sentence near the end of section 7.6.3 with the following:

The value of the Length entry in the stream dictionary shall be the length of the encrypted stream data.

mkl-public commented 2 months ago
  • For encrypted documents, section 7.6.3 (General encryption algorithm) states near the end: "The number of bytes to be encrypted or decrypted shall be given by the Length entry in the stream dictionary."

I think this sentence makes some sense if you read it case by case:

So in both cases, Length contains the number of bytes between stream[EOL] and endstream.

Actually, though, the shall indeed is wrong, Length contains these numbers by the definition of stream objects, this is no new requirement.

The Adobe Reference here said "The number of bytes to be encrypted or decrypted is given by the Length entry in the stream dictionary." And here the PDF Reference really meant to state a fact, not imply a new requirement. IMO here someone overeagerly added a shall too many during ISO-fication.

petervwyatt commented 2 months ago

I agree - "shall" in this case is wrong. Let's change it back....

petervwyatt commented 1 month ago

PDF TWG agree