jorisschellekens / borb

borb is a library for reading, creating and manipulating PDF files in python.
https://borbpdf.com/
Other
3.37k stars 148 forks source link

Copying a font in a PDF using low-level syntax #165

Closed moukraintsev closed 1 year ago

moukraintsev commented 1 year ago

Hello! I am using borb to research the technique of embedding and extracting digital watermarks. I use a low-level syntax for absolute positioning of characters in an existing document.

The problem occurs when trying to use the font of the source document. The output doc opens with default font and message: Font "Times_roman" contains wrong /BBox

Is it possible to fix it / what should i do?

base_file.pdf - original document work_file.pdf - modified document

NOTE: document created by borb, parameter FONT = 'Times_roman'

Code:



    # Reading document
    FONT = 'Times_roman'
    l: RegularExpressionTextExtraction = RegularExpressionTextExtraction(r'[\s\S ]')
    pdf_file_handle: typing.Union[BufferedIOBase, RawIOBase]
    in_file_handle: typing.Union[BufferedIOBase, RawIOBase]
    doc: typing.Optional[Document] = None

    with open(WORK_FILE_PATH, "rb") as in_file_handle:
        doc = PDF.loads(in_file_handle, [l])
    # fmt: on

    # check whether we have read a Document
    assert doc is not None

    # Adding new page for adding symbols
    page = Page()
    doc.add_page(page)

    # Creating content stream
    content_stream = Stream()

    content = b""""""
    i = 0
    for symbol in coord_array:
        content += b"""q BT /F1 %b Tf """ % (bytes(format(symbol[5], '.4f'), 'utf-8'))
        content += b"""%b %b Td """ % (
            bytes(format(symbol[1], '.4f'), 'utf-8'), bytes(format(symbol[2], '.4f'), 'utf-8'))
        content += b"""(%b) Tj ET Q """ % (bytes(str(symbol[0]), 'utf-8'))
        i += 1

    content_stream[Name("DecodedBytes")] = content
    content_stream[Name("Bytes")] = zlib.compress(content_stream["DecodedBytes"], 9)
    content_stream[Name("Filter")] = Name("FlateDecode")
    content_stream[Name("Length")] = bDecimal(len(content_stream["Bytes"]))

    # set content of page
    page[Name("Contents")] = content_stream

    # set Font
    # It seems that I need to copy the font from the source page and put it here, but I don't quite understand how to do it

    page[Name("Resources")] = Dictionary()
    page["Resources"][Name("Font")] = Dictionary()
    page["Resources"]["Font"][Name("F1")] = Dictionary()
    page["Resources"]["Font"]["F1"][Name("Type")] = Name("Font")
    page["Resources"]["Font"]["F1"][Name("Subtype")] = Name("Type1")
    page["Resources"]["Font"]["F1"][Name("Name")] = Name("F1")
    page["Resources"]["Font"]["F1"][Name("BaseFont")] = Name(FONT)
    page["Resources"]["Font"]["F1"][Name("Encoding")] = Name("MacRomanEncoding")

     # delete the original page
    doc.pop_page(0)

    # store PDF
    with open(WORK_FILE_PATH, "wb") as pdf_file_handle:
        PDF.dumps(pdf_file_handle, doc)`
jorisschellekens commented 1 year ago

Hi,

Please keep in mind the GitHub issues section of this project is meant for bug reports and feature requests. Your request is more of a "How can I do this?" or "Help me with this code".

We do have a tag on StackOverflow, which you can use.

Also, a small, self-contained code sample would make it easier for others to help you.

Lastly, if you intend to do this, there are (in my opinion) better/easier ways.

You could parse the content-stream of the Page and modify it (borb already does this whenever you're applyingRedactionAnnotation`s, so be sure to check out that code).

You could build your own version of ChunkOfText that optionally takes a letter_spacing parameter. Then you could look at SimpleFindReplace that removes text and replaces it with other text. That would also give you the right kind of inspiration.

Kind regards, Joris Schellekens