MLX Nougat is a CLI tool for OCR using the Nougat model.
Install ImageMagick:
brew install imagemagick
Configure environment variables for ImageMagick:
Add the following lines to your shell configuration file (e.g., ~/.bashrc, ~/.zshrc):
export MAGICK_HOME=$(brew --prefix imagemagick)
export PATH=$MAGICK_HOME/bin:$PATH
export DYLD_LIBRARY_PATH=$MAGICK_HOME/lib:$DYLD_LIBRARY_PATH
After adding these lines, reload your shell configuration or restart your terminal.
Install MLX Nougat:
git clone git@github.com:mzbac/mlx-nougat.git
cd mlx-nougat
pip install .
After installation, you can use MLX Nougat from the command line:
mlx_nougat --input <path_to_image_or_pdf_or_url> [--output <output_file>] [--model <model_name_or_path>]
--input
: (Required) Path to the input image or PDF file, or a URL to an image or PDF.--output
: (Optional) Path to save the output text file. If not provided, the output will be printed to the console.--model
: (Optional) Name or path of the Nougat model to use. Default is "facebook/nougat-small".Process a local image:
mlx_nougat --input path/to/your/image.png --output results.txt
Process a local PDF:
mlx_nougat --input path/to/your/document.pdf --output results.txt
Process a remote image:
mlx_nougat --input https://example.com/image.jpg --output results.txt
Process a remote PDF:
mlx_nougat --input https://example.com/document.pdf --output results.txt
Use a different model:
mlx_nougat --input path/to/your/image.png --model facebook/nougat-base --output results.txt
Use a quantized model:
mlx_nougat --input path/to/your/document.pdf --model mzbac/nougat-small-8bit-mlx
This project is built upon several open-source projects and research works: