Multi-node production GenAI stack. Run the best of open source AI easily on your own servers. Easily add knowledge from documents and scrape websites. Create your own AI by fine-tuning open source models. Integrate LLMs with APIs. Run gptscript securely on the server
in particular, two column layouts - which are common in academic papers - cause absolute mayhem and i'm surprised the model can make sense of it at all
current pdf text extraction doesn't generate markdown and includes a lot of cruft
https://github.com/VikParuchuri/marker looks like it might do a better job, give it a try