Unwanted Space between the letters of a word

I used the library to extract the text from a pdf file. There are some words which have been broken to two parts by an unwanted space.

Text from PDF: Text from text file:

Environment

OS: Windows 10 Python: 3.11 PyPDF: the latest

Code + PDF

This is the way I used the library

def convert_pdf_to_text(file_name):
    out = ""
    pdf_file_obj = open(file_name, 'rb')
    pdf_reader = PyPDF2.PdfReader(pdf_file_obj, strict=True)
    for page in pdf_reader.pages:
        text = page.extract_text()
        out += text
    return out

The PDF is confidential data. I tested that on multiple confidential PDF data

Traceback

This is the subpart of the (operation, operator) tuples that I printed to track. The Tm tag causes a call of orientation function that adds the unwanted space. I have briefly reviewed the pdf specification 1.7. Yet, I do not know what Tm tag exactly does.

Having the space adding sections of orientation removed, the text was extracted perfectly.

py-pdf / pypdf