Open Keyrxng opened 2 months ago
We just should rename the column to markup then
We need to test if embeddings work better with plaintext or all the markup context.
Well We aren't using GPT for creating embeddings so idk but GPT likes markdown and they used to write the system message in markdown so I'd assume it's better than plaintext as you can signify inner context with block quotes, headings etc but plaintext over HTML all day long if that was in the mix too.
Input should be properly sanitized and then stored according to the database schema. It appears that markdown is handled correctly but HTML is not and it should be as it's a supported GitHub comment format.
https://github.com/ubq-testing/generate-vector-embeddings/issues/5