gptscript-ai / knowledge

Knowledge for GPTScript
https://gptscript-ai.github.io/knowledge/
Apache License 2.0
29 stars 14 forks source link

Add testing framework, replace pdf parser (unidoc), support other files #20

Closed StrongMonkey closed 5 months ago

StrongMonkey commented 5 months ago

This PR adds the following things:

  1. E2E test framework and test cases from https://github.com/h2oai/enterprise-h2ogpte/blob/main/rag_benchmark/e2e_df.csv
  2. Supporting other files type like zip.
  3. Replace our existing pdf parser with unidoc to get a better performance.

Note: not all the tests from e2e test suite are passing(15 of 115 are failing). Need more improvement on our parser to get these test cases pass.