mongodb / chatbot

MongoDB Chatbot Framework. Powered by MongoDB and Atlas Vector Search.
https://mongodb.github.io/chatbot/
Apache License 2.0
128 stars 57 forks source link

filter out very small chunks #160

Closed mongodben closed 1 year ago

mongodben commented 1 year ago

Jira: n/a

Changes

Notes

Example of tiny chunks getting match ``` "Chunks found: [ { "sourceName": "snooty-cloud-docs", "url": "https://mongodb.com/docs/atlas/atlas-search/tutorial/lookup-with-search/", "score": 0.91689532995224, "text": "---\ntags:\n - atlas\n - docs\nproductName: MongoDB Atlas\nversion: null\npageTitle: \"How to Run \"\nhasCodeBlock: true\n---\n\n```", "tokenCount": 45, "updated": "2023-09-01T06:05:07.609Z", "metadata": { "tags": [ "atlas", "docs" ], "productName": "MongoDB Atlas", "version": null, "pageTitle": "How to Run ", "hasCodeBlock": true }, "chunkIndex": 8 }, { "sourceName": "snooty-cloud-docs", "url": "https://mongodb.com/docs/atlas/atlas-search/tutorial/lookup-with-search/", "score": 0.91689532995224, "text": "---\ntags:\n - atlas\n - docs\nproductName: MongoDB Atlas\nversion: null\npageTitle: \"How to Run \"\nhasCodeBlock: true\n---\n\n```", "tokenCount": 45, "updated": "2023-09-01T06:05:07.479Z", "metadata": { "tags": [ "atlas", "docs" ], "productName": "MongoDB Atlas", "version": null, "pageTitle": "How to Run ", "hasCodeBlock": true }, "chunkIndex": 5 }, ... ] ```
mongodben commented 1 year ago

@cbush can you give another look. i decided to put this as a config, not use the transform since i think this is a generalizable enough thing that it ought to be included in the core functionality.