Closed HakaishinShwet closed 2 weeks ago
@HakaishinShwet You are right, this project was developed for the production of high-quality corpora. Whether it's for the pre-training corpora of large models or for RAG applications, the MinerU project is highly suitable.
This tool can extract data from complex files so do you think it is a great solution for extracting and creating knowledge base for llm ?