A Python package for converting PDF files to Word documents and modifying URLs. This package utilizes Tesseract OCR for text recognition in PDF files.
To install the package, use pip:
pip install persian-pdf-converter
Here is an example of how to use the functions provided by this package:
from persian_pdf_converter.pdf_converter import pdf_to_word
# Path to your PDF file and output directory
pdf_path = 'path/to/example.pdf'
output_dir = 'path/to/output/dir'
# Convert PDF to Word
output_file = pdf_to_word(pdf_path, output_dir, lang="fas+eng", dpi=300)
print(f"Converted file saved as: {output_file}")
This function converts a PDF file to a Word document with text recognition.
pdf_path
(str): Path to the PDF file.output_dir
(str): Directory where the output Word file will be saved.lang
(str): Languages to be used by Tesseract for text recognition (default is "fas+eng"
).convert_from_path
.str
: Name of the output Word file.To contribute to this project, follow these steps:
git clone https://github.com/mahdiramezanii/persian_pdf_converter.git
cd persian_pdf_converter
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt
This project is licensed under the MIT License. See the LICENSE file for more details.
If you have any questions or suggestions, feel free to contact me at mahdiramazanii.official@gmail.com.