Closed dwarring closed 5 years ago
This task is currently driving some development in PDF::Content to improve handling of marked content.
As well as extracting content, the PDF Specification could be annotated and further indexed to cross reference sections with Perl 6 Classes.
Work underway on extracting tables to https://github.com/p6-pdf/p6-pdf.github.io/tree/master/doc/tables. See Makefile in that directory, for details.
Work has been migrated to newly created module https://github.com/p6-pdf/PDF-Specification-p6
Work underway on integrating generated interface roles in PDF::Class on iso32000 branch
Have uploaded as ISO::PDF_32000 to cpan and merge the iso32000 branch,
Elected to treat this as remove from the main branch prior to latest CPAN release.
Has been a useful exercise and has improved the accuracy of PDF::Class. May look at in in a future release.
I suspect that much of the material has been indirectly or directly produced from Adobe's internal data dictionaries and accurately represents the internal structure, as represented by the Adobe suite.
I've been manually constructing classes from the spec (which is what everyone seems to do); but a structured dump of the object definitions would greatly assist with checking the classes built so far, and with the completion of PDF::Class.
Hopefully achievable with the tools built to date. For example, both
pdf-checker.p6
andpdf-toc.p6
are capable of scanning the specification PDF.Dumping the tables in PDF Spec to JSON or some-such would be a big help. These seem to be reasonable well defined as such via the documents struct tree root.