lunasec-io / lunasec

LunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTrace GitHub App: https://github.com/marketplace/lunatrace-by-lunasec/
https://www.lunasec.io/
Other
1.44k stars 164 forks source link

Normalize Vulnerability Dataset for Buff #1135

Closed breadchris closed 1 year ago

breadchris commented 1 year ago

The dataset has a lot of extraneous data (pdfs, lots of tabs/newlines, etc.) Removing this from the data that we store will help reduce the price of ingestion.

breadchris commented 1 year ago

Client code to insert: https://github.com/getbuff/Buff/blob/main/server/document-upload/gather_documents.py#L30

breadchris commented 1 year ago

https://langchain.readthedocs.io/en/latest/_modules/langchain/vectorstores/pinecone.html#Pinecone.add_texts save embeddings to disk