instructlab / ui

Place to hack on UI for InstructLab
Apache License 2.0
9 stars 19 forks source link

Host docling service with UI deployment for document conversion #137

Open vishnoianil opened 3 weeks ago

vishnoianil commented 3 weeks ago

Knowledge contribution to the taxonomy repo requires user to refer to the document that is used to provide the context and Q&A in the knowledge contribution. It is required that the document is in markdown format for the backend services to consume and train the model. UI provides a way to upload the document and generate the document related metadata that is required by the knowledge contribution, BUT currently it doesn't do any kind of document conversion if the document is not in markdown format.

Asking user to do the document conversion through any tool at their disposal and ensuring that the generated document is converted properly (in case of complex scenario's such as huge table) is a big ask and not an ideal user experience. If the UI can provide a service that does the automatic document conversion when user uploads the knowledge document, it would make the overall knowledge contribution work flow simpler and consumable.

Project docling is an open source project that provides really stable implementation of document conversion and it does really great job in converting the complex documents from pdf to markdown. Exploring the possibility of hosting docling as a service with the UI deployment, that UI can leverage to convert the document would significantly improve the overall user experience.

cc: @nerdalert @Gregory-Pereira

vishnoianil commented 4 days ago

Related issue #2