Open pesco opened 2 years ago
Depends on what you mean by "would be allowed".
Is that fragment BX 1 2 \o EX
allowed to be present in a content stream? Yes, it is. Why? Because nothing specifically prohibits it. It's conceptually the same as having this fragment
1 2 \o (A String) Tj
It's just a series of valid PDF objects followed by an operator.
With respect to processing such content - that is a separate question which is not entirely spelled out by ISO 32000... on purpose, because it would force specific implementation decisions. (i.e. is the content stream parser stack based or not)
What kind of object is \o
? Unless I am missing something, it is not a valid object, so it cannot be an argument. But can it be an operator? What else can be an operator?
The issue is that the spec does not specify what operators look like.
I agree the spec doesn't specify exactly what an operator looks like.
We leave to inference from lexical conventions, the whitespace list, the token delimiter list, the definition of basic PDF objects, etc what "all other things" might be from a parsing/tokenization PoV. Effectively an operator is the same but they only occur inside content streams (as the content stream dialect also prohibits indirect references it is slightly different to outside content streams).
So in your BX
/EX
example, \o
should parse as a single token and would be seen as an operator with 2 integer operands on the stack (1
and 2
). Because its between BX
/EX
this unknown thing (supposedly as operator since it didn't parse as anything else) should be skipped over...
But a lot of this is unstated directly...
So what's the intent? An operator could be any token that is not something else? A more direct definition might be helpful, even if it were more restrictive (like the syntax of names sans solidus).
A more direct definition might be helpful
It might be - but you are almost 30 years too late for doing so.
even if it were more restrictive
Why would we do that? That would make existing PDF files no longer compliant and that isn't good business...
I am not trying to be argumentative. I have no idea what the "in the wild" reality of this syntax is, so was considering it entirely possible that the reality had never been anywhere near as generic as "any token at all that isn't otherwise recognized" but rather more closely to what's usually used for keywords (which operators are defined to be) - namely something like alphanumeric strings or strings of regular characters if you wanted to be generous.
In any case, as Peter wrote, the spec leaves this implicit and it should probably spell it out.
This and other related/interlinked issues (#194 #199 #201 #208 #209) are being discussed in the "Securing PDF" DG of ISO TC 171 SC 2 WG 8 and will be labelled as "Parked" here in GitHub until such time as a set of solutions can be proposed.
See Errata #363 - I think that by formally defining "PDF keywords" at a lexical level that flows through and then also resolves this issue.
Will be resolved as part of the resolution to Errata #363.
The syntax for content stream operators is not explicitly defined. This is a problem with compatibility sections as introduced by the last paragraph of 7.8.2 "Content streams":
It is not clear if this means that the following would be allowed:
Clearly
1
and2
are objects and thus operands, followed by\o
, a sequence of regular characters that is not an object. Paragraph 5 of 7.8.2 notes:This sounds like operator keywords might be meant to follow the same syntax as name objects after the initial solidus but stops short of saying so.