Validation PDFa Broken After runing pdfsizeopt

GoogleCodeExporter commented 9 years ago

Pdfsizeopt broke PDFa :

- Remove ID
- Remove line break with 'endobj' and 'endstream'

I update William Bader patch for PDFa optimisation working :

./pdfsizopt.py --use-multivalent=false test.pdfa.pdf test.opt.pdfa.pdf

( multivalent broke PDFa )

Original issue reported on code.google.com by j.lefebv...@gmail.com on 17 Jul 2010 at 1:47

Merged into: #13

Attachments:

pdfsizeopt.pat

GoogleCodeExporter commented 9 years ago

Which patch are you updating? Please give a URL to the original patch, and 
please attach the original pdfsizeopt.py you are patching.

In the long run, I think pdfsizeopt should not generate PDFa files by default. 
So if you want to get this patch integrated to mainstream pdfsizeopt, please 
add it so that it has to be enabled by a command-line flag.

Original comment by pts...@gmail.com on 10 Feb 2011 at 7:50

GoogleCodeExporter commented 9 years ago

I've integrated most of the attached patch (pdfsizeopt.pat) to the trunk, r158, 
except for /Type/Page unification (has to be disabled with 
--do-unify-pages=false explicitly), except for Multivalent -nocore14, and 
except for these entries:

@@ -475,9 +475,9 @@
       output.append(self.stream)
       # We don't need '\nendstream' after a non-compressed content stream,
       # 'Qendstream endobj' is perfectly fine (accepted by gs and xpdf).
-      output.append('endstream endobj\n')
+      output.append('\nendstream\nendobj\n')
     else:
-      output.append('%sendobj\n' % space)
+      output.append('%s\nendobj\n' % space)

   def __GetHead(self):
     if self._head is None and self._cache is not None:
@ -3302,7 +3310,7 @@
       trailer_obj.Set('Compress', None)  # emitted by Multivalent.jar
       # Emitted by Multivalent.jar etc., see section 10.3 in
       # pdf_reference_1-7.pdf .
-      trailer_obj.Set('ID', None)
+      #trailer_obj.Set('ID', None)
       assert trailer_obj.head.startswith('<<')
       assert trailer_obj.head.endswith('>>')
       output.append('trailer\n%s\n' % trailer_obj.head)
@@ -5816,7 +5871,7 @@
         # Please note that we save the space of the removed /ID and /Compress
         # below, because /Type/XRef is usually the last object, so we don't
         # need to add padding.
-        pdf_obj.Set('ID', None)
+        #pdf_obj.Set('ID', None)
         pdf_obj.Set('Compress', None)
         if pdf_obj.Get('Index') != None:
           raise NotImplementedError('unexpected /Index in xref object')
@@ -2592,15 +2592,17 @@
     else:
       pdf_obj.Set('BitsPerComponent', pdf_image_data['BitsPerComponent'])
       pdf_obj.Set('ColorSpace', pdf_image_data['ColorSpace'])
-      pdf_obj.Set('Decode', pdf_image_data.get('Decode'))
+      if pdf_obj.Get('Decode') == None:
+        # Update Decode only if it is currently not set
+        pdf_obj.Set('Decode', pdf_image_data.get('Decode'))
     pdf_obj.Set('Filter', pdf_image_data['Filter'])
     pdf_obj.Set('DecodeParms', pdf_image_data.get('DecodeParms'))
     pdf_obj.Set('Length', len(pdf_image_data['.stream']))
     # Don't pdf_obj.Set('Decode', ...): it is good as is.
     pdf_obj.stream = pdf_image_data['.stream']

   def CompressToZipPng(self):

About PDF/A compatibility: yes, /ID has to be present and endobj/endstream must 
have whitespace in front of them for PDF/A compatibility. These features will 
be added to pdfsizeopt later, please post further comments about PDF/A support 
to http://code.google.com/p/pdfsizeopt/issues/detail?id=13 .

Original comment by pts...@gmail.com on 4 Mar 2011 at 1:35

GoogleCodeExporter commented 9 years ago

The /Decode issue has been fixed in 
http://code.google.com/p/pdfsizeopt/issues/detail?id=37 , and the PDF/A issues 
are to be discussed in http://code.google.com/p/pdfsizeopt/issues/detail?id=13 
, so closing this issue now.

Original comment by pts...@gmail.com on 4 Mar 2011 at 1:43

Changed state: Duplicate

hefronmedia / pdfsizeopt

Validation PDFa Broken After runing pdfsizeopt #38