instructlab / ui

Place to hack on UI for InstructLab
Apache License 2.0
7 stars 15 forks source link

Automatically Upload Documents - document name issue #131

Open aevo98765 opened 3 weeks ago

aevo98765 commented 3 weeks ago

When you automatically upload a pdf document the name of the file that gets added is the name of the PDF ending in .pdf. The actual name that is needed for the yaml submission is the name of the md file on Github. This is potentially misleading to users. Discussion needed about the automatically upload document user flow as I think that this will be the route of entry for most users.

@Misjohns @nerdalert @vishnoianil

Misjohns commented 2 weeks ago

@aevo98765 Does InstructLab automatically convert the PDF to markdown OR does the file stay as PDF and fails to get properly added? Just trying to understand if this creates errors or is it a matter of educating the user that their file is being converted.

vishnoianil commented 2 weeks ago

At this point of time, UI doesn't do any conversion from pdf to markdown. But project docling from IBM deepsearch is an open source project that we can leverage to do the conversion from pdf to markdown. Currently it's user's responsibility to convert the document to markdown. I know this is not an ideal user experience, so hosting the docling service with our UI deployment that can do the document conversion would really improve the overall knowledge contribution experience for the users. Here is an issue to explore this option - https://github.com/instructlab/ui/issues/137

Misjohns commented 2 weeks ago

@vishnoianil @aevo98765 @nerdalert There doesn't seem to be an easy way for the user to convert to markdown. Until we get the #137 completed, do we need to provide the user with instructions for how to complete this conversion? I found this tool but not sure if we can recommend it: https://products.aspose.app/words/conversion/pdf-to-markdown#:~:text=Upload%20PDF%20files%20to%20convert,in%20Markdown%20format%20for%20viewing.

vishnoianil commented 2 weeks ago

@vishnoianil @aevo98765 @nerdalert There doesn't seem to be an easy way for the user to convert to markdown. Until we get the #137 completed, do we need to provide the user with instructions for how to complete this conversion? I found this tool but not sure if we can recommend it: https://products.aspose.app/words/conversion/pdf-to-markdown#:~:text=Upload%20PDF%20files%20to%20convert,in%20Markdown%20format%20for%20viewing.

I think at this point of time (till we host doc conversion service) we should leave it to user to use the tool that is available to them. Rather than recommending some tool, i think we should just suggest open source alternates that they can use. wdyt?