smallcloudai / refact-lsp

LSP server for Refact, suitable for Sublime Text, and other editors
BSD 3-Clause "New" or "Revised" License
22 stars 12 forks source link

[RAG] Filter-out machine-generated files #147

Closed valaises closed 4 months ago

valaises commented 4 months ago

vecdb indexing is stuck forever when indexing a project with some python packages in it, node_modules, etc. Guess, vecdb search quality won't be great either when some many garbage in it

valaises commented 4 months ago

there's existing filtering pipeline in python, could be useful https://github.com/smallcloudai/data-collection/blob/main/github/scripts/stage4_diffs/filtration/plain/prefilter.py