ubiquity-os-marketplace / text-vector-embeddings

0 stars 7 forks source link

Handle HTML input #15

Open Keyrxng opened 2 months ago

Keyrxng commented 2 months ago

Input should be properly sanitized and then stored according to the database schema. It appears that markdown is handled correctly but HTML is not and it should be as it's a supported GitHub comment format.

https://github.com/ubq-testing/generate-vector-embeddings/issues/5

image

0x4007 commented 2 months ago

We just should rename the column to markup then

We need to test if embeddings work better with plaintext or all the markup context.

Keyrxng commented 2 months ago

Well We aren't using GPT for creating embeddings so idk but GPT likes markdown and they used to write the system message in markdown so I'd assume it's better than plaintext as you can signify inner context with block quotes, headings etc but plaintext over HTML all day long if that was in the mix too.