pdf-association / pdf-issues

Industry-based resolutions for issues and errata reported against any PDF-related specification
https://pdf-issues.pdfa.org/
64 stars 2 forks source link

/QuadPoints in text markup annotation #68

Closed u-fischer closed 2 years ago

u-fischer commented 3 years ago

Describe the bug

QuadPoints are required for text markup annotation (table 182) and according to the description there the coordinates of their rectangle should be given in counterclockwise order:

The coordinates for each quadrilateral are given in the order: 𝑥1 𝑦1 𝑥2 𝑦2 𝑥3 𝑦3 𝑥4 𝑦4 specifying the four vertices of the quadrilateral in counterclockwise order.

But while this order seem to be used for QuadPoints in link annotations, a number of pdfcreators (we tested with adobe, evince, okular) don't follow the spec for text markup annotations but use another order, and annotations using the coordinate order from the spec aren't rendered correctly by major pdfviewers. So it looks as if it is currently not possible to create a usable and standard conforming text markup annotation.

E.g. these are the quadpoints for a Highlight annotation created by adobe pro:

 /QuadPoints    
 [133.768 652.752 
  169.741 652.752 
  133.768 642.789 
  169.741 642.789 

  133.768 640.797 
  166.423 640.797 
  133.768 630.834 
  166.423 630.834]  

The coordinates go clearly not counterclockwise, but bottom left - bottom right - top left - top right.

For comparision the coordinates for a link over two lines:

/QuadPoints 
  [201.875 654.744 
   320.096 654.744 
   320.096 664.707 
   201.875 664.707 

   133.768 642.789 
   193.821 642.789 
   193.821 652.752 
   133.768 652.752]

This goes correctly counterclockwise (bottom left bottom right top right top left).

A Highlight annoation using this QuadPoints:

/QuadPoints
  [238.68544 603.7864 
   260.67048 603.7864 
   260.67048 679.4078 
   238.68544 679.4078 ]

is rendered like this by adobe:

image

petervwyatt commented 3 years ago

@u-fischer Are you able to share some PDFs?

I would also state that these identical words have been in all PDF specifications since PDF 1.3 circa 2000.

u-fischer commented 3 years ago

@petervwyatt I attach a pdf. The link and the highlight were added with a current adobe pro, the pdf then was uncompressed with pdftk.

It shows the following quadpoints:

Link:

[332.423 704.236 
 477.479 704.236 
 477.479 718.353 
 332.423 718.353 

 133.768 692.281 
 207.512 692.281 
 207.512 706.398 
 133.768 706.398]

Highlight:

[385.116 658.577 
 477.479 658.577 
 385.116 644.46 
 477.479 644.46 

 133.768 646.622 
 325.848 646.622 
 133.768 632.505 
 325.848 632.505] 

I would also state that these identical words have been in all PDF specifications since PDF 1.3 circa 2000.

Yes I know. When I got the distorted highlights, I checked also the other references and googled. I found e.g. this nearly ten old question describing the discrepancy too: https://stackoverflow.com/questions/9855814/pdf-spec-vs-acrobat-creation-quadpoints

But it is a problem if a spec is so at odds with the real implementations.

quadpoints-test-unc.pdf

mkl-public commented 3 years ago

I would also state that these identical words have been in all PDF specifications since PDF 1.3 circa 2000.

A number of issues have been carried along for many revisions of the PDF References and can still be found in the ISO specs nowadays.

As the Adobe PDF References were not normative in nature (as Leonard put it), implementers more followed Adobe Acrobat's example than the exact wording of those references. Only since ISO 32000-1 the idea started to come up that the specification might be more relevant than the Acrobat implementation.

lrosenthol commented 3 years ago

I have been over the Acrobat code and I can't find any case where it writes out the Quads in the wrong order. Here is a fragment of code that serializes a ASFixedQuad into a CosArray.

void PDAnnotSetQuads(PDAnnot annot, ASFixedQuad *quads, ASArraySize numQuads)
{
    CosObj coAnnot = PDAnnotGetCosObj(annot);
    CosDoc coDoc = CosObjGetDoc(coAnnot);
    CosObj coQuads = CosNewArray(coDoc, false, numQuads * 8);

    for (ASUns32 i = 0, n = 0; i < numQuads; ++i)
    {
        CosArrayPut(coQuads, n++, CosNewFixed(coDoc, false, quads[i].bl.h));
        CosArrayPut(coQuads, n++, CosNewFixed(coDoc, false, quads[i].bl.v));
        CosArrayPut(coQuads, n++, CosNewFixed(coDoc, false, quads[i].br.h));
        CosArrayPut(coQuads, n++, CosNewFixed(coDoc, false, quads[i].br.v));
        CosArrayPut(coQuads, n++, CosNewFixed(coDoc, false, quads[i].tr.h));
        CosArrayPut(coQuads, n++, CosNewFixed(coDoc, false, quads[i].tr.v));
        CosArrayPut(coQuads, n++, CosNewFixed(coDoc, false, quads[i].tl.h));
        CosArrayPut(coQuads, n++, CosNewFixed(coDoc, false, quads[i].tl.v));
    }

    CosDictPut(coAnnot, QuadPoints_K, coQuads);
}
u-fischer commented 3 years ago

@lrosenthol well I have no idea about the code but are you sure that tr and tl contain what they should contain and have not be exchanged somewhere?

lrosenthol commented 3 years ago

They would only be exchanged in the case where the page has a transform of some sort (eg. /R key on the page).

u-fischer commented 3 years ago

@lrosenthol I didn't mean exchanged by purpose, but by mistake. I mean the output is clearly wrong, so somewhere in the code there must be a wrong assignment.

petervwyatt commented 2 years ago

This issue looks to be related to a specific implementation and not the PDF specification which is clear and has been for a very long time. Closing.

zauguin commented 2 years ago

This issue looks to be related to a specific implementation and not the PDF specification which is clear and has been for a very long time. Closing.

I think this should be reopened. Yes, it's based on an issue of a specific implementation, but since this was the original implementation for this feature and was never conformant, the non conformant behavior is much more widely implemented than the specified one. I don't think that any PDF files actually follow the current spec. (About a year ago I implemented QuadPoints in a PDF writer based on the specification and the files were broken in almost all viewers, so I had to revert to a different order. I would assume that most other writers had to do something similar)

I think that specifying an order which is different from all implementation does more harm than good, especially since PDF readers can't switch to the specified behavior without becoming incompatible with old files. For e.g. highlight it could be solved as in pdf.js where the order in the PDF is ignored, but e.g. for underlines this will potentially draw the underline on the wrong side (if the annotation is rotated too much). (Underlines are an issue anyway since current viewers only draw underline annotations correctly when the QuadPoints use an order which according to the spec would place the line over the text.)

petervwyatt commented 2 years ago

To be discussed in the next PDF TWG

lrosenthol commented 2 years ago

Here is the code from the well known implementation in discussion where it reads the QuadPoints from an annotation

CosObj coQuads = CosDictGet(coAnnot, QuadPoints_K);
for (ASUns32 i = 0, n = 0; i < nQuads; ++i)
{
    quads[i].bl.h = CosFixedValue(CosArrayGet(coQuads, n++));
    quads[i].bl.v = CosFixedValue(CosArrayGet(coQuads, n++));
    quads[i].br.h = CosFixedValue(CosArrayGet(coQuads, n++));
    quads[i].br.v = CosFixedValue(CosArrayGet(coQuads, n++));
    quads[i].tr.h = CosFixedValue(CosArrayGet(coQuads, n++));
    quads[i].tr.v = CosFixedValue(CosArrayGet(coQuads, n++));
    quads[i].tl.h = CosFixedValue(CosArrayGet(coQuads, n++));
    quads[i].tl.v = CosFixedValue(CosArrayGet(coQuads, n++));
}

However, I found code in said well known implementation to handle the case where an implementor gets the Quads wrong, and it needs to adjust them into a proper bounding rect

/*
** FixedQuadToEnclosingRect - return "upright" enclosing rect
*/
void ACROEXPORTPRIV FixedQuadToEnclosingRect( const ASFixedQuad *fq, ASFixedRectP fr )
{
    ASFixed t1, t2;

    /* quads are not necessarily "upright" => must test all four corners */
    t1 = MIN(fq->tl.h,fq->tr.h);
    t2 = MIN(fq->bl.h,fq->br.h);
    fr->left = MIN(t1,t2);

    t1 = MAX(fq->tl.v,fq->tr.v);
    t2 = MAX(fq->bl.v,fq->br.v);
    fr->top = MAX(t1,t2);

    t1 = MAX(fq->tl.h,fq->tr.h);
    t2 = MAX(fq->bl.h,fq->br.h);
    fr->right = MAX(t1,t2);

    t1 = MIN(fq->tl.v,fq->tr.v);
    t2 = MIN(fq->bl.v,fq->br.v);
    fr->bottom = MIN(t1,t2);
}
petervwyatt commented 2 years ago

Discussed in PDF TWG - specification is very clear. Implementations are free to go above and beyond the spec. No changes to spec.