mzbac / mlx-nougat

8 stars 0 forks source link

MLX Nougat

MLX Nougat is a CLI tool for OCR using the Nougat model.

Installation

  1. Install ImageMagick:

    brew install imagemagick
  2. Configure environment variables for ImageMagick:

    Add the following lines to your shell configuration file (e.g., ~/.bashrc, ~/.zshrc):

    export MAGICK_HOME=$(brew --prefix imagemagick)
    export PATH=$MAGICK_HOME/bin:$PATH
    export DYLD_LIBRARY_PATH=$MAGICK_HOME/lib:$DYLD_LIBRARY_PATH

    After adding these lines, reload your shell configuration or restart your terminal.

  3. Install MLX Nougat:

    git clone git@github.com:mzbac/mlx-nougat.git
    cd mlx-nougat
    pip install .

Usage

After installation, you can use MLX Nougat from the command line:

mlx_nougat --input <path_to_image_or_pdf_or_url> [--output <output_file>] [--model <model_name_or_path>]

Arguments

Examples

  1. Process a local image:

    mlx_nougat --input path/to/your/image.png --output results.txt
  2. Process a local PDF:

    mlx_nougat --input path/to/your/document.pdf --output results.txt
  3. Process a remote image:

    mlx_nougat --input https://example.com/image.jpg --output results.txt
  4. Process a remote PDF:

    mlx_nougat --input https://example.com/document.pdf --output results.txt
  5. Use a different model:

    mlx_nougat --input path/to/your/image.png --model facebook/nougat-base --output results.txt
  6. Use a quantized model:

    mlx_nougat --input path/to/your/document.pdf --model mzbac/nougat-small-8bit-mlx

TODOs

Acknowledgements

This project is built upon several open-source projects and research works: