instructlab / sdg

Python library for Synthetic Data Generation
https://pypi.org/project/instructlab-sdg/
Apache License 2.0
23 stars 35 forks source link

Only use CPU for the docling OCR models #361

Closed bbrowning closed 1 week ago

bbrowning commented 1 week ago

Because GPU memory is extremely tight in many of our supported hardware configurations, and because our GitHub Mac CI runners error out when running the OCR models with MPS acceleration, let's just explicitly pin the OCR models to the CPU.

See DS4SD/docling#286 for a bit more context.

bbrowning commented 1 week ago

We'll need to merge this before upgrading our docling dependency above version 2.4.2 to prevent test failures on our Mac CI runners.