ralsina / pdfrw

Automatically exported from code.google.com/p/pdfrw
Other
0 stars 0 forks source link

Spurious brackets in URIs. #14

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

1. Get a PDF with a URI in an annotation.
2. Run this code on it:

#!/usr/bin/env python

import sys
import os

from pdfrw import PdfReader, PdfWriter

def convert(inpfn, outfn):

  pdf = PdfReader(inpfn)

  for K in pdf.Root.Pages.Kids:
    if K.Annots is not None:
      for An in K.Annots:
        if An.A is not None:
          if An.A.URI is not None:
            An.A.URI = An.A.URI

  outdata = PdfWriter()

  outdata.trailer = pdf

  outdata.write(outfn)

for inpfn in sys.argv[1:]:
    print inpfn, ':'
    outfn = 'out/' + inpfn
    convert(inpfn, outfn)

Expected output: the output PDF should be identical to the input.

Actual result: In the output PDF the URI will have extra brackets added around 
it, ie instead of

http://www.example.com

the URI now points to:

(http://www.example.com)

which fails to open correctly in any PDF reader.

Using version 0.1-1 on Ubuntu 14.04.

Original issue reported on code.google.com by a.j.bux...@gmail.com on 21 Oct 2014 at 4:16

GoogleCodeExporter commented 9 years ago
Test case, examples:

http://al.robotfuzz.com/~al/pdfrw/

Original comment by a.j.bux...@gmail.com on 21 Oct 2014 at 4:24