Open gnthaker opened 3 months ago
Thank you for filing this issue, @gnthaker! Helps us keep track of it. We discussed in last night's triage meeting that we desire to also create better tooling for PDF --> Markdown conversion and generally make data ingestion a less cumberson process. As we are moving fast and a young project, I am not sure where on our roadmap this will land timing-wise.
Once again, thank you for filing this issue and assuring we don't lose track of this clear need.
The people in the community who I know who have talked the most about this need are on the Triage team. If you want to talk to them about scoping this work, you can find them in #triage on InstructLab Slack.
Yep, @gnthaker, please reach out; we have some thoughts and suggestions to get something off the ground, but nothing formalized in a pipeline or anything.
This issue has been automatically marked as stale because it has not had activity within 90 days. It will be automatically closed if no further activity occurs within 30 days.
It would be good if we can provide PDF or other unstructured data from which we can generate synthetic data.