mahdiramezanii / persian_pdf_converter

Python package to convert pdf to Farsi Word
https://pypi.org/project/persian-pdf-converter/
MIT License
125 stars 9 forks source link
docker pypi python

persian_pdf_converter

A Python package for converting PDF files to Word documents and modifying URLs. This package utilizes Tesseract OCR for text recognition in PDF files.

Features

Requirements

Installation

To install the package, use pip:

pip install persian-pdf-converter

Usage

Here is an example of how to use the functions provided by this package:

from persian_pdf_converter.pdf_converter import pdf_to_word

# Path to your PDF file and output directory
pdf_path = 'path/to/example.pdf'
output_dir = 'path/to/output/dir'

# Convert PDF to Word
output_file = pdf_to_word(pdf_path, output_dir, lang="fas+eng", dpi=300)
print(f"Converted file saved as: {output_file}")

pdf_to_word Function

This function converts a PDF file to a Word document with text recognition.

Parameters:

Returns:

Development

To contribute to this project, follow these steps:

  1. Clone the repository:
    git clone https://github.com/mahdiramezanii/persian_pdf_converter.git
  2. Navigate to the project directory:
    cd persian_pdf_converter
  3. Create a virtual environment and activate it:
    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  4. Install the dependencies:
    pip install -r requirements.txt
  5. Make your changes and run tests.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Contact

If you have any questions or suggestions, feel free to contact me at mahdiramazanii.official@gmail.com.