Closed hfmandell closed 4 months ago
Can you give an example of the content of the
props
indo_BDC()
that you would like to use?
The props
are not immediately obviously helpful, in that they are simply an alphanumeric string that is unique to the particular OCG. In testing this, I've seen props
that clearly describe an OCG, such as "/oc13". Others are less clear and are not reminiscent of the acronym "OCG". They can be seen in the output of dumppdf.py
for a given PDF, with the leading "/".
There's a bit more logic needed to be done to tie these props
to the actual name of the OCG in the PDF, for example, the "Roads" layer of a layered PDF map. Still, this functionality of associating a PDF vector drawing with its props
allows the user to categorize the LTCurves/Lines/Rects into their OCGs. A future MR could tie it directly to the PDF layer name.
Thanks for the extra info. I see now why storing the OCG could be useful in some specific cases.
I've been reading 8.11 (Optional Content) from the PDF Reference, but find it quite tricky to understand. Do you happen to have a PDF that has optional content groups that you can share? That would help me to understand them.
As far as I understand now the properties of the BDC
operator are also used for other purposes, not just OCG's. Therefore simply converting to string and storing it in the graphics state is not enough. E.g. the test PDF's have a couple of BDC
's with a /P
tag and some extra properties. I think these are unrelated to OCG's, but correct me if I'm wrong.
Closing because no response. Feel free to reopen when extra info is available.
Pull request
This PR fixes Issue 903 which was raised by me after encountering this problem.
Many vector PDFs have Optional Content Groups (OCGs), also referred to as layers. When extracting LTComponents like LTCurve, LTLine, and LTRect, one may find the need to keep track of which OCG the LTComponent is attributed to. This is accomplished by:
ocg
attributes to LTCurve, LTLine, and LTRect in 'pdfminer/layout.py'ocg
attribute in 'pdfminer/converter.py'ocg
attribute to the PDFGraphicState object in 'pdfminer/pdfinterp.py'ocg
attribute in 'pdfminer/pdfinterp.py' when the vector graphic BDC command is encountered in the PDF's stream and ensuring the currentocg
value is maintained even when the graphic state is restored with the vector graphic Q command.How Has This Been Tested?
Please remove this paragraph with a description of how this PR has been tested. [TODO]
Checklist